This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Post-Processing PDF For Back-Of-The-Book Indexes
- From: "W. Eliot Kimber" <eliot at isogen dot com>
- To: XSL List <xsl-list at lists dot mulberrytech dot com>
- Date: Sun, 10 Feb 2002 09:20:32 -0600
- Subject: [xsl] Post-Processing PDF For Back-Of-The-Book Indexes
- Organization: DataChannel, Inc
- Reply-to: xsl-list at lists dot mulberrytech dot com
In reference to an earlier thread about eliminating duplicate page
numbers in back-of-the-book indexes generated by XSL-FO styles, I have
successfully done this using the free PJ library from www.Etymon.com.
With this library you can interact with PDF at the lowest level of
granularity (individual PDF operators within a page). In my case, I was
able to get to the individual lines of the index pages, find sequences
of repeated numbers, remove them from the document, and write a new PDF
document. It required about 150 lines of Python (using the Jython
interpreter to provide access to the PJ Java library) to implement the
initial functionality I needed.
I'm not quite ready to post code--I need to refine what I've written and
do more testing, but I wanted to report this initial success as I know
others are struggling with this same problem.
Cheers,
Eliot
--
W. Eliot Kimber, eliot@isogen.com
Consultant, ISOGEN International
1016 La Posada Dr., Suite 240
Austin, TX 78752 Phone: 512.656.4139
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list