This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Generating indexes
- To: "XSL List" <xsl-list at lists dot mulberrytech dot com>
- Subject: [xsl] Generating indexes
- From: "Gustaf Liljegren" <gustaf dot liljegren at xml dot se>
- Date: Tue, 25 Sep 2001 12:08:44 +0200
- Reply-To: xsl-list at lists dot mulberrytech dot com
I have a document in XML with some words marked up with <index> tags. This
document is later going to be transformed into PDF and printed like a book,
with an index. I'm aiming to do this task automatically.
The general idea is to collect the words and phrases marked-up with <index>,
plus the pages on which they appear, to get a list of all matches, in no
particular order, or possibly document order. In a positional flat file, it
may look like this:
12 yoghurt
153 milk
122 yoghurt
132 egg
43 olive oil
32 egg
As soon as I have the page numbers I have total control when producing an
index. I can do scripts that handle cases like 121, 123, 124, 125 (should be
"121, 123-125"). I can handle special characters like á, é, å, ä and ö so
they appear in correct order and so on.
The hard thing is to generate this file of matches.
Of course, XSLT can't know anything about page-numbers, so I guess this is
something that has do be drawn from a rendering engine. Before digging
deeper into this, I wonder if anyone has achieved it, or been successful in
alternative ways.
Just to clarify: I'm not aming at doing a full-blown index. This should be a
one-level index, and the indexing work (placing <index> tags around certain
words in certains elements) is still a work for a human indexer, or to
intelligent scripts. In fact, I made an indexing script, but it's not
intelligent enough to know about mouse and mice... :-)
Gustaf
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list