This is the mail archive of the xsl-list@mulberrytech.com mailing list .

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Extracting a list of unique base urls from anchors in a html document.

To: "'XSL-List at lists dot mulberrytech dot com'" <XSL-List at lists dot mulberrytech dot com>
Subject: [xsl] Extracting a list of unique base urls from anchors in a html document.
From: Taras Tielkes <taras at info dot nl>
Date: Thu, 13 Sep 2001 18:23:01 +0200
Reply-To: xsl-list at lists dot mulberrytech dot com

Hello,

I have a source HTML document that has been converted to XHTML.

This document contains a number of anchor elements (<A>). I want to use XSLT
to extract information about the links contained in the document.

First, I restrict the returned links to links that point to files in the
same folder, like this:

//a[contains(@href,'#') and not(contains(@href, '/'))]

As you see, I'm also restricting the returned links to the ones that have a
hash (#) character in their href attribute.

So far so good. Now I add a second predicate (formatted for readability): 

//a
[contains(@href,'#') and not(contains(@href, '/'))]
[not(substring-before(@href,'#')=substring-before(preceding::a/@href,'#'))]

The second predicate should (I think) limit the returned node-set to contain
only anchors that have a href attribute that has a unique base-url (the part
before the #). However, the expression with the second predicate appended
still returns multiple links that have the same base-url part. Why?

Thanks in advance,
// tt

P.S. In the text above, "link", "anchor" and "<A>" are interchangable 

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Follow-Ups:
- Re: Extracting a list of unique base urls from anchors in a html document.
  - From: David Carlisle

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]