This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

sorting a list of titles after removal of stopwords and special characters


Dear Colleagues,

I am trying to sort a list of titles that have been processed using XSLT to
remove all leading articles (stopwords "A", "An", and "The"), and to remove
special characters such as  [,  ], ^, and so on.

So far, I am unable to sort the list of titles correctly and would
appreciate whatever help you may provide. Please see relevant files below.

Regards,

Brian L. Pytlik Zillig
Digital Initiatives Librarian
University of Nebraska-Lincoln Libraries
mailto:bpytlikz@unlnotes.unl.edu
(402)472-4547


-----------
Here is my XML file, "lead.xml":

<?xml version="1.0" encoding="utf-8"?>

<?xml-stylesheet href="leadingstopwords.xsl" type="text/xsl"?>

<!-- Correct sorted order should be:
The American Way
A Better Way
An Evil Day
Xerxes Unchained: A Memoir
The Yanks Are Coming!
Zeitgeist as Poltergeist
A Zoo Story
-->

<ead>
<book>
   <title>^A Zoo Story^</title>
</book>

<book>
   <title>[The Yanks Are Coming!]</title>
</book>

<book>
   <title>Zeitgeist as Poltergeist</title>
</book>

<book>
   <title>The American Way</title>
</book>

<book>
   <title>A Better Way</title>
</book>

<book>
   <title>An Evil Day</title>
</book>

<book>
   <title>Xerxes Unchained: A Memoir</title>
</book>
</ead>

-----------
And here is my XSL file, "lead.xsl":

<?xml version='1.0'?>

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
xmlns:stop="test.unl.edu"
exclude-result-prefixes="stop">

<stop:stoplist>

<ignore>the</ignore>
<ignore>a</ignore>
<ignore>an</ignore>

</stop:stoplist>

<xsl:template match="/">

<html>
<body>

<B>Sorted: </B><P />
<xsl:apply-templates select="//ead/book/title" mode="with-stoplist">
<xsl:sort select="$stoplist" order="descending"/>
</xsl:apply-templates>

</body>
</html>

</xsl:template>

<!-- Stoplist template to DROP initial articles "A", "An" and "The" in
title, and to remove special characters, including square brackets "[" and
so on -->
<xsl:template match="//ead/book/title" mode="with-stoplist">
<xsl:variable name="begins-with" select
="$stoplist[starts-with(translate(current(), $uppercase,
$lowercase), concat(translate(., $uppercase, $lowercase), ' '))]" />
<xsl:value-of select="translate(substring(., string-length($begins-with) +
1),'[/]-=@#$%^()','')" />
<P />
</xsl:template>

<!-- Declares variables for sorting -->
<xsl:variable name="stoplist" select="document
('')/xsl:stylesheet/stop:stoplist/ignore" />
<xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'" />
<xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />

</xsl:stylesheet>

-----------
And here is my output file, created using Instant Saxon, "lead.html":

<html>
   <body><B>Sorted: </B><P></P>A Zoo Story
      <P></P>The Yanks Are Coming!
      <P></P>Zeitgeist as Poltergeist
      <P></P> American Way
      <P></P> Better Way
      <P></P> Evil Day
      <P></P>Xerxes Unchained: A Memoir
      <P></P>
   </body>
</html>






 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]