This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Extracting a list of unique base urls from anchors in a html document.
- To: "'XSL-List at lists dot mulberrytech dot com'" <XSL-List at lists dot mulberrytech dot com>
- Subject: [xsl] Extracting a list of unique base urls from anchors in a html document.
- From: Taras Tielkes <taras at info dot nl>
- Date: Thu, 13 Sep 2001 18:23:01 +0200
- Reply-To: xsl-list at lists dot mulberrytech dot com
Hello,
I have a source HTML document that has been converted to XHTML.
This document contains a number of anchor elements (<A>). I want to use XSLT
to extract information about the links contained in the document.
First, I restrict the returned links to links that point to files in the
same folder, like this:
//a[contains(@href,'#') and not(contains(@href, '/'))]
As you see, I'm also restricting the returned links to the ones that have a
hash (#) character in their href attribute.
So far so good. Now I add a second predicate (formatted for readability):
//a
[contains(@href,'#') and not(contains(@href, '/'))]
[not(substring-before(@href,'#')=substring-before(preceding::a/@href,'#'))]
The second predicate should (I think) limit the returned node-set to contain
only anchors that have a href attribute that has a unique base-url (the part
before the #). However, the expression with the second predicate appended
still returns multiple links that have the same base-url part. Why?
Thanks in advance,
// tt
P.S. In the text above, "link", "anchor" and "<A>" are interchangable
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list