This is the mail archive of the
docbook@lists.oasis-open.org
mailing list for the DocBook project.
[docbook] Re: Future DocBook: goals/requirements?
- From: Norman Walsh <ndw at nwalsh dot com>
- To: Michael Smith <smith at xml-doc dot org>
- Cc: docbook at lists dot oasis-open dot org
- Date: Mon, 16 Jun 2003 17:28:19 -0400
- Subject: [docbook] Re: Future DocBook: goals/requirements?
- References: <878yspd1hn.fsf@nwalsh.com> <20030602131358.GA1668@donnybrookfair>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
/ Michael Smith <smith@xml-doc.org> was heard to say:
| Could you say something about the main goals or requirements behind the
| changes you've in outlined in your 'Ruminations' articles ?
Here are some further thoughts on why I think now is the time to
refactor DocBook. Apologies, in advance, if some of these issues have
already been touched on in the thread. I haven't caught up yet, but I
had noticed that Michael asked this question so I wrote these thoughts
while I was disconnected on the plane ride home.
1. The single most compelling reason, the reason that I think would be
sufficient if it was the only reason, is that DocBook has become
brittle. It has grown, slowly and reasonably conservatively but
continuously, for many years. Changes that were each individually
small and well conceived form quite a tenuous pile when taken all
together. Look at the number of class and mixture parameter
entities we now have. Many are very similar but not the same. Can
you tell from inspection why they aren't the same? Is the
organizing principle that created them discernable? I don't think
so. As the current maintainer, I'm aware that this is my fault to
one degree or another.
Whatever the cause, and irrespective of whether or not it was
avoidable, we've reached the point where my software engineering
experience suggests that attempts to continue on a path of
accumulating patches is not practical.
2. DocBook was conceived, designed, and built within the limiting
framework of SGML and then XML DTDs. In some ways it stands as a
testament to just how much you could do with those technologies.
But they are hardly modern.
For a project as large and important (if one measures importance in
terms of number of users or amount of legacy, at least) as DocBook,
I think novelty for novelty's sake would be a very bad idea indeed.
In fact, if all things were equal, I don't think it would be
inappropriate for DocBook to lag behind the technology curve. It
needs to be stable and reliable.
But all things are not equal. I think we've passed a complexity
threshold beyond which the parameter entity mechanisms available in
DTDs are simply not up to the task of supporting further
development. I am not, and have never intended to, suggest that
DocBook shouldn't be available as a DTD for many years to come, I
just don't think that the DTD should be the "source format", the
format upon which further development and customization is based.
3. Engineering advances do not proceed smoothly and uniformly over
time. Instead, they proceed in fits and starts, with watershed
events spuring periods of rapid development. I think RELAX NG is a
watershed event in markup languages.
DocBook hasn't suddenly become unmanageable because we added one more
tag. The development of DocBook has been straining the bounds of
DTD development for some time. I have been thinking about how to
make progress, about how to perform a refactoring (although I'm not
sure I was consciously aware that that was what I was considering)
for several years. The famous "PE reorganization" RFE has existed
for at least five years. I've considered, and even prototyped,
several possible approaches.
RELAX NG is a watershed event because it changes the validation
model just a little bit. It removes some restrictions and allows us
to think about validation in a different way. Suddenly I see a
clear path forward, a way to build a much simpler, more coherent,
more easily customizable DocBook framework.
Now, at the moment, I have only a vision, and a few sketchy
prototypes. I don't have enough running code to be certain my ideas
will work. But I feel pretty confident.
4. Tools exist (thank you again, James) that will allow us to continue
to support existing tools and applications even as we move forward.
If moving to RELAX NG required us to turn our back on every
DTD-based XML tool that processes DocBook, the very idea of doing
it would be very much D.O.A.
My vision for the intermediate future is one where DocBook is
maintained in RELAX NG and where customization layers (both
extensions and subsets) are devised at the RELAX NG level. But DTDs
are still provided by translating the RELAX NG grammars with Trang.
It is likely to be the case that the DTDs will not validate
precisely the same documents as the RELAX NG grammar. The extent to
which there is variation will depend on part upon how we design
DocBook, but I don't think perfect fidelity should be a goal.
If perfect fidelity isn't possible, why bother? Because even a
slightly less constrained schema can still be used to drive editing
tools like Emacs and Epic. And it will allow all the existing
DTD-based tools to continue to offer some level of validation.
(They'll be able to find simple typos, for example, even if they
can't enforce every constraint.)
5. DocBook needs to be able to adapt to a changing world. I've already
found several occasions, for example, in which it would have been
convenient for DocBook to have been in a namespace. I can imagine
scenarious where it would be almost necessary. No matter what you
think about namespaces, I think they're here to stay. I don't see
any long term viability to an attitude of refusing to use them, at
least judiciously.
6. I think similar arguments can be made for the judicious use of
simple data types, although I'm by no means certain of that. I can
imagine, for example, that there might be value in validating that
the content of the <date> element is, in fact, a date. And
even more potential value in being able to sort dates and other
simple values "correctly".
7. I think DocBook is a world leader in its class. I think there's an
opportunity here to continue that leadership role and I think we
should take that opportunity. We should reinvent DocBook for the
modern markup world.
I don't think anything I'm suggesting is radical. I don't propose
that we invent something that's going to be maliciously (or
capriciously) incompatible with the current needs or even the
current markup of existing users.
It's just time to refactor. I think that's a natural part of the
life cycle of an software system that's in the middle of its
productive lifespan.
| Is the aim mainly to make the vocabulary easier to maintain, or is it to
| make it easier to use? Or just to bring some order and consistency to
| the content models?
Yes, yes, and yes.
| Looking at the classes of changes you outline in the articles
| (rationalizing inlines, normalizing metadata, discarding cruft,
| miscellaneous changes to simplify thing) and in your protoype, it seems
| like it's more of a "cleaning up" and not really anything like the kinds
| of more extensive refactorings that others have mentioned on the list
| (e.g., splitting DocBook into a 'core' set of elements + modules for
| different types of user needs).
I've argued[1] (hmm, for consistency I should put the text above in
the blog thing as well, will do) that multiple namespaces shouldn't be
used to make extension modules. But I'm not opposed, at least right
this moment, to making a smaller core with additional modules.
| That is, it's still one big schema of 300+ elements, with most of the
| attribute values on those elements being the same as what they are
| currently.
|
| And when you say that your prototype is three-quarters finished, what's
| the nature of the other one-quarter you'd do if you were to finish it?
One large part is making class/mixture equivalents for the block
elements. It's in rough shape for the inlines, but not so much for the
blocks.
Basically, it's done enough to make me feel comfortable that the basic
ideas work. But there's gobs of T's to cross and I's to dot.
| You mention that the TC has talked many times about 'reworking the
| parameter entities', but your current prototype isn't meant to be a
| complete solution to that, right? In your Relax NG grammar, I see named
| patterns for classes of inlines, but none yet for classes of divisions/
| components/blocks -- and also not yet any definition-replacement hooks
| that would facilitate customization of the schema.
RELAX NG provides some of the customization facilities directly. But
you're right about the blocks.
Be seeing you,
norm
[1] http://norman.walsh.name/2003/06/11/oneNSorMany
- --
Norman Walsh <ndw@nwalsh.com> | Mankind are always happy for
http://www.oasis-open.org/docbook/ | having been happy; so that if you
Chair, DocBook Technical Committee | make them happy now, you make them
| happy twenty years hence by the
| memory of it.--Sydney Smith
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.7 <http://mailcrypt.sourceforge.net/>
iD8DBQE+7jZzOyltUcwYWjsRAlDxAJ4qEc3FIg8OHRRovwdAWWry35zB1gCdEUzh
ptG5aZKBE5gK68zExwFcAMs=
=jdgi
-----END PGP SIGNATURE-----
---------------------------------------------------------------------
To unsubscribe, e-mail: docbook-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: docbook-help@lists.oasis-open.org