This is the mail archive of the
docbook-apps@lists.oasis-open.org
mailing list .
Re: Word to Docbook using Perl?
- From: Andy Jewell <andy_jewell at fanniemae dot com>
- To: Corey Wells Arnold <corey at wubios dot wustl dot edu>
- Cc: 'docbook apps list' <docbook-apps at lists dot oasis-open dot org>
- Date: Mon, 10 Mar 2003 15:49:08 -0500
- Subject: Re: DOCBOOK-APPS: Word to Docbook using Perl?
- Organization: Fannie Mae
- References: <002001c2e744$314c1fa0$6575fc80@wxpcorey>
I use a hodgepodge document system with this tool chain. First, the
document is written in Word 2000 using a (homegrown) docbook stylesheet.
Then a small army of scripts run to end up with docbook XSL:
* doc2txt converts the DOC to HTML
* wh2fo converts the HTML to XML
* Saxon processes homemade XSL to generate mostly valid docbook XML
* Saxon processes docbook XML into FO using docbook xsl stylesheets
* FOP processes docbook FO into PDF
* All glued somewhat precariously together with make
It's certainly not perfect but supports enough elements to fit my needs.
Below is what I use to process a Word table.
HTH -
-- Andy
<?xml version="1.0" ?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml"/>
<xsl:template match="paragraph[ at class='tablecaption']">
<xsl:variable name="passid">
<xsl:if test="reference">
<xsl:value-of select="reference[substring(@id,1,5)='refid']/@id"/>
</xsl:if>
</xsl:variable>
<xsl:if test="name(preceding-sibling::*[position()=1]) = 'table'">
<xsl:apply-templates select="preceding-sibling::table[position()=1]">
<xsl:with-param name="refid"><xsl:value-of
select="$passid"/></xsl:with-param></xsl:apply-templates>
</xsl:if>
</xsl:template>
<xsl:template match="table">
<xsl:param name="refid"/>
<xsl:variable name="numcols" select="count(child::row[1]/cell)"/>
<xsl:variable name="tabcap"
select="following-sibling::paragraph[ at class='tablecaption'][1]"/>
<table>
<xsl:choose>
<xsl:when test="not($refid='')">
<xsl:attribute name="id"><xsl:value-of select="$refid"/>
</xsl:attribute>
</xsl:when>
<xsl:otherwise>
<xsl:attribute name="id"><xsl:value-of
select="generate-id()"/></xsl:attribute>
</xsl:otherwise>
</xsl:choose>
<xsl:attribute name="frame">all</xsl:attribute>
<title>
<xsl:value-of select="$tabcap"/>
</title>
<tgroup>
<xsl:attribute name="cols"><xsl:value-of
select="$numcols"/></xsl:attribute>
<xsl:attribute name="align">left</xsl:attribute>
<xsl:attribute name="colsep">1</xsl:attribute>
<xsl:attribute name="rowsep">1</xsl:attribute>
<thead>
<row>
<xsl:for-each select="child::row[1]/cell">
<entry><xsl:apply-templates/></entry>
</xsl:for-each>
</row>
</thead>
<tbody>
<xsl:apply-templates select="row[position() > 1]"/>
</tbody>
</tgroup>
</table>
</xsl:template>
<xsl:template match="row">
<row>
<xsl:apply-templates select="cell"/>
</row>
</xsl:template>
<xsl:template match="cell">
<entry>
<xsl:apply-templates select="paragraph[contains(@class,'item')][1]"
mode="table" />
<xsl:apply-templates
select="paragraph[not(contains(@class,'item'))]"/>
</entry>
</xsl:template>
</xsl:stylesheet>
Corey Wells Arnold wrote:
> Has anyone ever run across a Perl program that can process MS Word
> tables into Docbook tables? I am putting together a document that is
> being written by many people who will be using Word. Any ideas on what
> is the best way to convert it to Docbook XML? I have tried using the
> Open Office filter, but it lacks support for some tags and doesn't seem
> to process tables correctly.
>
> Thanks,
>
> Corey