This is the mail archive of the
mailing list .
Re: How can I filter stoppwords
- To: "Sellmer-Brüls, Barbara" <B dot Sellmer-Bruels at klopotek dot de>
- Subject: Re: How can I filter stoppwords
- From: Jeni Tennison <mail at jenitennison dot com>
- Date: Sat, 02 Sep 2000 09:24:29 +0100
- Cc: "'XSL-List at mulberrytech dot com'" <XSL-List at mulberrytech dot com>
- Reply-To: xsl-list at mulberrytech dot com
>Does anybody know another way to filter stopp words?
I'm not sure, but I think you were only after filtering stop words that
start the name of the book? Adapting Eric's solution:
The xsl:stylesheet element declares the necessaries, and the additional
namespace 'sw' that is used for the internal data (the list of stop words).
To prevent this namespace being declared on your output, use
Then the declaration of the stop words that you want to filter out. I've
put these in a variable so that they can be accessed easily:
Declaration of two variables so that we can translate between upper and
lower case fairly easily:
<xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'" />
<xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />
Now the template. I've only used one for brevity, but of course you can
split it down into several through calling and applying templates. Within
this template, I iterate through each of the titles. For each title, I
find all the stop words such that the current title starts with that stop
word (plus a space, and all ignoring case). If there is such a match, then
the title is substring()ed to give the resulting title by taking off the
characters that make up the word it begins with.
<before><xsl:value-of select="." /></before>
' '))]" />
select="substring(., string-length($begins-with) + 2)" />
<xsl:value-of select="." />
This strips leading stop words in SAXON and MSXML (July). It works in
Xalan-C++ v.0.40.0 except for the exclude-result-prefixes thing, which is
>How do you XSL-create a sort criterion?
...you can't (at the moment) use a template to create a string to use as a
sort criterion. Sort criteria have to be XPath select expressions. This
problem will go away when (a) you can convert RTFs to node sets and/or (b)
when you can use something like saxon:function to declare extension
functions within XSLT.
For the meantime, then you have to use something really horrible like:
<xsl:sort select="concat(substring(substring-after(., ' '), 0 div
$lowercase), concat(translate(., $uppercase, $lowercase), ' '))])),
substring(., 0 div not($stop-words[starts-with(translate(current(),
$uppercase, $lowercase), concat(translate(., $uppercase, $lowercase), '
<title><xsl:value-of select="." /></title>
(Honestly, it doesn't look that much clearer even when it *is* indented ;)
This works in SAXON, MSXML (July) and Xalan (with the exception of the
I hope that helps,
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list