This is the mail archive of the
docbook-apps@lists.oasis-open.org
mailing list .
FW: [docbook-apps] RE: HTML -> DocBook Conversion?
- From: "Steve Whitlatch" <swhitlat at getnet dot net>
- To: <docbook-apps at lists dot oasis-open dot org>
- Date: Thu, 7 Oct 2004 09:02:23 -0700
- Subject: FW: [docbook-apps] RE: HTML -> DocBook Conversion?
> Do you have a summary written up about any
> problems or limitations
> you ran into getting valid DocBook output from
> Frame 7? A couple
> years back Bob Stayton wrote up a list of some
> problems he found,
> and I did the same. Summary is at:
>
> http://groups.yahoo.com/group/xml-doc/message/3257
Thank you to you and Bob for that summary. I
found it and read it before I attempted any
DocBook work in FrameMaker. It was an excellent
heads up. I'm still new to DocBook and XSL, but
I'd been using FrameMaker (unstructured) for
years. So, I had some confidence in the venture
and with the summary from you and Bob I went
ahead. The problem myself and many others have
when things go wrong is that we don't know
whether the problem is caused by us (mistakes) or
by the software (deficiencies, bugs). A heads up
beforehand can really help with the learning.
I guess that the entire README for my
FrameMaker+DocBook project can't be considered a
summary. It is really long and detailed. The most
important bit of info, just my opinion, is that
virtually all the structured FrameMaker trouble
is attributable to FrameMaker's need to translate
XML to its FrameMaker format, including the
DTD-to-EDD translation, which can be (is) also a
source of trouble. Add to that the scuttlebut on
the Internet that Adobe let go its entire
FrameMaker programming staff after releasing
FrameMaker version 7, the remaining maintenance
off to India, and what do we have? Adobe
announced that the Mac version would be phased
out; the experimental Linux version died long ago.
Again, just my opinion, and I do pay close
attention to what more experienced people say on
this list and other lists, I think that a
real-time-WYSIWYG XML authoring tool that formats
both on-screen display and output according to
legal, standard XSL is what's needed. If someone
does not want the WYSIWYG part, they can turn it
off. FrameMaker partially provides this, just not
with XSL. I forgot to add, the XSL from the
mythical tool I describe must be portable, no lockin.
Maybe Arbortext's Styler provides just as good
functionality. I may have the chance to learn
some Arbortext tools, and I don't really know
what Styler is/does, so I am open to hearing
about it from others.
In the free world, my own little project: DocBook
XSL Configurator (far from a WYSIWYG tool),
provides some help for creating XSL customization
layers for DocBook:
https://sourceforge.net/projects/db-xsl-cfg/
OK, I've become long winded. You asked for a
summary about the problems I may have had getting
valid DocBook XML output from FrameMaker. In the
FrameMaker world, they call it "round-tripping."
Here is an excerpt from the README:
*****************************
Public/System Identifiers.
To get FrameMaker to write a public
identifier in output XML, I used
the following read/write rule:
*******
writer external dtd is public "-//OASIS//DTD
DocBook XML V4.2//EN"
"/usr/share/docbook-xml42/docbookx.dtd";
*******
FrameMaker 7.0 cannot correctly write out a
URL used as a system
identifier. For example,
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
is always changed to
"http:/www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
Note the missing forward slash.
Upon XML export.
1) Lots of fairly ordinary ascii characters
were changed to internal general entities
(such as hyphens, etc.).
2) "imagedata" element "fileref" attributes
became external parameter entities
3) Each structured file in the FrameMaker
book became an external file referenced
by an external parameter entity, but so
did the unstructured files (TOC, LOF, LOT,
and Index). The unstructured files had to
be removed from the exported XML tree before
validating.
4) Many (all?) default attribute values not
explicit in the input XML were made explicit
in the output XML.
I suppose all import/export behavior could be
changed with a "custom client," but that is
like selling someone a piece of software as
"able to do _anything_ you want," and
adding "you just have to do some programming."
What is this fabulous piece of software that
is so versatile it can do anything I want?
Well, of course, it's a compiler!
*****************************
It's an exaggeration to say this, but I think the
point comes across. With structured FrameMaker,
everything can be cured with a custom API client,
which one typically writes in C using the
FrameMaker Developer Kit and some other
programming packages from Adobe. Personally, I
found that to be more work than I was willing to put in.
> Did you run into those same problems? If so, how
> did you work around them?
Yes, the same problems.
> Post-processing, maybe? Or some
> custom proramming.
I manually made the necessary adjustments to the
XML that FrameMaker output in order to get the
output XML to validate. But then I only had one
document to do that with. In large enterprise
production environments, I'm sure they use
automated processes to do the same or similar.
However, it's really not good that any of the
adjustments to output XML need to be done at all.
> One thing about working with MIF is, there is no
> free open-source MIF parser for Frame 5 (or 6
or 7). There was one
> once that could handle Frame 4 files, I think.
So if you were
> really to build your own system for working
with MIF in Perl or
> whatever, you'd first need to create a MIF parser.
I did not know that. That's interesting info.
Obviously, when we eliminate the unnecessary
intermediate formats, many problems disappear.
Just plain DocBook XML authored in GNU emacs and
then processed by libxml2 tools takes me a long
way with fewer opportunities for trouble. But I'm
just one guy at home on my computer. I know that
what works for me may not work for large enterprises.
> (And Steve, I don't mean you personally,
because I know you
> already know XSLT
Not very well. I am heavily dependant on others,
like Bob Stayton, for the DocBook XSL
stylesheets. I don't know, but there may be
literally thousands like me who are similarly
dependant on the DocBook XSL stylesheets.
Steve Whitlatch