This is the mail archive of the mailing list .

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Word to Docbook using Perl?

I use a hodgepodge document system with this tool chain.  First, the
document is written in Word 2000 using a (homegrown) docbook stylesheet.
Then a small army of scripts run to end up with docbook XSL:

   * doc2txt converts the DOC to HTML
   * wh2fo converts the HTML to XML
   * Saxon processes homemade XSL to generate mostly valid docbook XML
   * Saxon processes docbook XML into FO using docbook xsl stylesheets
   * FOP processes docbook FO into PDF
   * All glued somewhat precariously together with make

It's certainly not perfect but supports enough elements to fit my needs.
Below is what I use to process a Word table.


-- Andy

<?xml version="1.0" ?>
  <xsl:output method="xml"/>

  <xsl:template match="paragraph[ at class='tablecaption']">
  <xsl:variable name="passid">
  <xsl:if test="reference">
     <xsl:value-of select="reference[substring(@id,1,5)='refid']/@id"/>

   <xsl:if test="name(preceding-sibling::*[position()=1]) = 'table'">
      <xsl:apply-templates select="preceding-sibling::table[position()=1]">

 <xsl:with-param name="refid"><xsl:value-of

  <xsl:template match="table">
    <xsl:param name="refid"/>
    <xsl:variable name="numcols" select="count(child::row[1]/cell)"/>
    <xsl:variable name="tabcap"
select="following-sibling::paragraph[ at class='tablecaption'][1]"/>
 <xsl:when test="not($refid='')">
   <xsl:attribute name="id"><xsl:value-of select="$refid"/>
   <xsl:attribute name="id"><xsl:value-of
      <xsl:attribute name="frame">all</xsl:attribute>
 <xsl:value-of select="$tabcap"/>
 <xsl:attribute name="cols"><xsl:value-of
 <xsl:attribute name="align">left</xsl:attribute>
 <xsl:attribute name="colsep">1</xsl:attribute>
 <xsl:attribute name="rowsep">1</xsl:attribute>
     <xsl:for-each select="child::row[1]/cell">
   <xsl:apply-templates select="row[position() > 1]"/>

  <xsl:template match="row">
      <xsl:apply-templates select="cell"/>

  <xsl:template match="cell">
      <xsl:apply-templates select="paragraph[contains(@class,'item')][1]"
mode="table" />


Corey Wells Arnold wrote:

> Has anyone ever run across a Perl program that can process MS Word
> tables into Docbook tables?  I am putting together a document that is
> being written by many people who will be using Word.  Any ideas on what
> is the best way to convert it to Docbook XML?  I have tried using the
> Open Office filter, but it lacks support for some tags and doesn't seem
> to process tables correctly.
> Thanks,
> Corey

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]