This is the mail archive of the docbook-apps@lists.oasis-open.org mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

RE: DOCBOOK: MS files included with elements?


(first, sorry to Norman Walsh -- this should go here, not explicitly
to you ;-)

/ Galen Boyer <galenboyer@yahoo.com> was heard to say:
| Oh God, I'll probably get killed for this question.
| 
| Is there some tag which can be used to include a word doc or
| excel file or other element?

I suppose that this would be extremely difficult.  I guess that you
should want to convert the doc into XML.  The following may help
you only if you want to do it once with the Word document.

I am very new to XML/SGML and DocBook, but I did the conversion
of say 150 pages Word document into XML.  I did it via exporting the
doc into HTML, and then I did a lot of perl fiddling... Now I have
well-formed XML, but not the DocBook markup, yet.

The process was rather painful -- because I did not know 
HTML Tidy program before!!!  (My thanks to Dave Raggett
who wrote it and to Jirka Kosek who mentioned it in his book.)

So, if I was forced to do it again, I would do it this way:

  1. Export the Word to HTML (manually).
  2. Use HTML Tidy (off line) do convert the <font ...> and the like
     tags into markup that uses CSS (automatically) and to
     output the XML result.
  3. Use ImageMagick to convert the images into the desired
     format (off line).
  4. Use some XSLT processor and write XSL file to prescribe 
     the conversion of that XML to DocBook XML (off line).
  5. Perl may still be needed.

Well, I never did the third step (being very new to XSL), nor I know
whether it is the best approach.  I guess that there could be some
easier way.  Anyway, I think that "Word to HTML" is the first step
to follow and I do not think that can be done off-line.

Any comments?  (I want to learn something better ;-)

Petr

-- 
Petr Prikryl, SKIL, spol. s r.o., prikrylp@skil.cz


------------------------------------------------------------------
To unsubscribe from this elist send a message with the single word
"unsubscribe" in the body to: docbook-apps-request@lists.oasis-open.org


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]