This is the mail archive of the xsl-list@mulberrytech.com mailing list .

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Generating indexes

To: "XSL List" <xsl-list at lists dot mulberrytech dot com>
Subject: [xsl] Generating indexes
From: "Gustaf Liljegren" <gustaf dot liljegren at xml dot se>
Date: Tue, 25 Sep 2001 12:08:44 +0200
Reply-To: xsl-list at lists dot mulberrytech dot com

I have a document in XML with some words marked up with <index> tags. This
document is later going to be transformed into PDF and printed like a book,
with an index. I'm aiming to do this task automatically.

The general idea is to collect the words and phrases marked-up with <index>,
plus the pages on which they appear, to get a list of all matches, in no
particular order, or possibly document order. In a positional flat file, it
may look like this:

12    yoghurt
153   milk
122   yoghurt
132   egg
43    olive oil
32    egg

As soon as I have the page numbers I have total control when producing an
index. I can do scripts that handle cases like 121, 123, 124, 125 (should be
"121, 123-125"). I can handle special characters like á, é, å, ä and ö so
they appear in correct order and so on.

The hard thing is to generate this file of matches.

Of course, XSLT can't know anything about page-numbers, so I guess this is
something that has do be drawn from a rendering engine. Before digging
deeper into this, I wonder if anyone has achieved it, or been successful in
alternative ways.

Just to clarify: I'm not aming at doing a full-blown index. This should be a
one-level index, and the indexing work (placing <index> tags around certain
words in certains elements) is still a work for a human indexer, or to
intelligent scripts. In fact, I made an indexing script, but it's not
intelligent enough to know about mouse and mice... :-)

Gustaf



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]