This is the mail archive of the docbook@lists.oasis-open.org mailing list for the DocBook project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

doc domain vs. problem domain semantics (Re[2]: listitem)


Again, sorry I let so much time go by, before getting together my response.  
I hope you can find a moment to consider some of my concerns.


>From: docbook-digest-errors@lists.oasis-open.org
>To: docbook-digest@lists.oasis-open.org
>Subject: docbook-digest Digest #77
>Date: Mon, 03 Dec 2001 08:33:39 -0500 (EST)
>
.
.
.

(The following is the header to the the proper message.  In my previous 
message, I quoted the same header, though it was actually a reply to a 
different message -- apologies.)

>From: Norman Walsh <ndw@nwalsh.com>
>To: docbook@lists.oasis-open.org
>Subject: DOCBOOK: Re: listitem
>Date: Mon, 03 Dec 2001 06:34:32 -0500
>
>/ "Matt G." <matt_g_@hotmail.com> was heard to say:
>[...]
>| >You might try using nested variablelists.
>| >Inside each of your main listitems, use a variable
>| >list for this structured information.
>|
>| Well, I have a couple of problems with this approach.  First of
>| all, it gets even further from the proper semantics to describe
>| what I'm actually trying to do (though I could deal with that,
>| though hopefully it'd only be a stop-gap measure).
>
>What are the semantics of your data?

If I used nested variable lists, the top-level one would be fairly 
appropriate (which is what I'm doing), since each item is a field in a data 
structure, but the nested one would have an entry for each *property* of a 
given field, which is pretty far off from the implied semantics of 
variablelist.

As a matter of fact, I'd guess that more often than not, variablelist is 
used to list things other than variables.  This gets the subject of my 
message, and the tangent the thread is getting off to, which is that since 
there aren't semantics rich enough to describe the types of formatting 
structures people use in documents, the more domain-specific ones are fallen 
back upon, as a crutch.  This has the effect of ruining the semantics of the 
domain-specific markup, particularly if it's uses are mixed, within a single 
document.


>| More importantly (in the
>| short-term) it doesn't even appear to be nested, at all, in the
>| DSSSL print style-sheets (version 1.74b - the latest).
>
>Using what backend?

OpenJade 1.3.  Is there any other DSSSL implementation as complete and 
mature?


>| First of all, I feel one needs to be able to nest structural
>| elements in <listitem>.  I'd certainly like to hear other points
>| of view, on the matter, but I just think it's imperative to be
>| able to partition a <listitem> into a finer-grained structure.
>
>There's a great long list of structural elements than you can put
>inside a list item. Section isn't one of them because it would
>make a complete mockery of the document hierarchy.

I don't really care whether the same constructs are available for use within 
a <listitem> as elsewhere; my point is that I think there might be valid 
reasons to subdivide <listitem>s further, into titled chunks. Do you agree?


>| Secondly, I'm surprised there's no sort of an element with a
>| title on the same line (see my <namedproperty> block element
>| example, in my previous message).
>
>"Title on the same line" is a presentational, not a semantic or
>structural issue.

True -- what I'm really concerned about is the structure of the construct, 
which it can be difficult to get into the habit of conceptually separating 
from the presentation.  However, I think that unless you have semantics for 
99% of the problem domain, you need semantics specific to the document 
domain, on which to fall back.  If those aren't available, then people will 
resort to abusing what problem domain constructs you give them that have the 
presentational or structural properties similar to what's missing.


>| sort of output, and there you go!  HOWEVER, if DocBook is ever
>| to scale to meet the basic needs of a substantial portion of the
>| various technical and scientific documentation sub-domains, it
>| must provide
>
>"DocBook is an XML/SGML vocabulary particularly well suited to
>books and papers about computer hardware and software (though it
>is by no means limited to these applications)."

So, is there really no desire to augment it to be better suited for more 
general documentation tasks and more easily adaptable to other sorts of 
problem domains than HW/SW?

IMO, the DocBook DTD (which, admittedly, I haven't really spent much time 
dissecting) should be partitioned into document construct and HW/SW 
constructs (in addition to the various other classes of attribute and entity 
definitions).  Stylesheets, too.  This would make it easier for say a 
biotech publication or physics department of a major university to use the 
core documentation semantics as a foundation for their own field-specific 
documentation vocabulary, without carrying extra baggage or suffering with 
unnecessary name collisions with semantics foreign to their domain.

Another important development would need to be replacements (which could 
co-exist, in conventional DocBook) for things that get abused as fall-backs, 
like <variablelist>.  Having complete document-domain semantics would allow 
users to transform their own specialized vocabularies into this DocBook 
subset, as an intermediate stage, and avoid the complexity of going straight 
to XSL-FO (which is also less useful than a richer, more structured 
vocabulary, like DocBook).


>The target domain of DocBook is computer software and hardware
>documentation. It happens to be suitable for a very wide range of
>other sorts of documentation, but the technical committee has
>historically been reluctant to add new markup specifically for
>features outside the scope of its present domain (DocBook is quite
>large enough :-).

Do you see that what I'm interested in is two things:
1) Preserving the semantics of HW/SW-specific constructs, by
   providing suitable fall-backs
2) Allowing DocBook to be more easily adapted to other domains,
   either through augmentation or as a richly structured
   intermediate format.


This has the advantage of allowing other fields to better take advantage of 
the effort and refinement that has gone into DocBook and the tools that have 
been developed for it.  Also, the more people who use DocBook, the better 
off those of us are who have expertise in working with it or who have 
developed tools for it, as our skills and tools become more marketable.  
With regard to the latter point, bare in mind that while there's been lots 
of money in the computer HW and SW fields, recently, that may not always be 
the case, as comoditization continues and the supply of skilled labor 
increases.


>| In fact, my opinion is that there should be a layer *between*
>| problem-domain specific semantics and XSL-FO, which would be
>| comprised exclusively of constructs relating to document
>| structure.  Then, an
>
>Given my experience with the way authors write, it's generally
>impractical to separate document structure entirely from
>lower-level semantics.

You'll always have high-level structural elements, like <section> and 
<book>, but I'd argue that you might even be able to do away with things 
like generic sorts of lists, if your semantics are sufficiently rich and 
well adapted to the problem domain.

Whether it's worth trying to capture the semantics of the text, so 
thoroughly, is highly case-specific, which is why I think flexibility to 
easily adopt either approach, and even transition from augmenting to 
layering, is of great importance.


>| pursue.  If they are fairly unambitious, they can seek to augment
>| the structural vocabulary with some of their own extensions.  If
>| they want to promote or enforce more rigid semantics that deal
>| exclusively with their problem-domain, and/or if they want close
>| control over the structure and content of their output documents,
>| they can add a layer on top of the document structure semantics
>| (using XSLT, to do the translation into XML DocBook, for
>| example).  Furthermore, there would be a fairly smooth
>| transition path from the former to the latter.
>
>Uh huh. Been there, done that. Do it often, in fact. Although I
>think I tend to do it from the "other end" so to speak. Usually,
>I have some very specific set of data that doesn't fit nicely
>into my documentation markup, but I want to use it in my
>documentation. So I write an XSLT stylesheet to convert it into

To me, it seems the real question is one of whether there are any other 
applications for the data than presenting it in a human-readable form.  If 
not, and if the structure of a DocBook document isn't a terribly 
inconvenient authoring format, then I say just write the document.  However, 
there are many cases where a document isn't the most desirable repository 
format, for the data, due to other processing requirements, authoring 
efficiency, or manageability concerns.  (IMO, the latter tends to be under 
appreciated--most of my design documentation is embedded within my source 
code, in structured comment blocks, for example.)


>want. Then with makefiles, I can edit the data and rebuild the
>documentation entirely painlessly.

So, you don't have a tool to generate your dependencies automatically, do 
you?  I'll soon whip one up, in Python.  I probably won't bother to convert 
it to C/C++, unless I can get it in the Xerces distribution, though.  
Ideally, I think you also want a command-line XPath query tool, for 
dependencies that don't use external entities or the external DTD subset.


>| As it happens, I'm in the process of doing the former, however
>| I'm generating external parsed XML DocBook entities.  So, it's
>| a little like the latter, in that it's partially layered on top
>| of DocBook.  I'd hoped to take advantage of DocBook stylesheets
>| to do most of the formatting work, for me.  I also wanted to be
>| able to use the core DocBook semantics in portions of the
>| documents that are written by hand (these parts contain the
>| actual references to the external entities, giving the author
>| the ability to change their order and use, within the document,
>| as well as provide additional context).
>
>Sounds like a fine plan to me.

It suits my needs quite well.  My point above is that there are probably 
plenty of cases for which this model isn't a good fit.

FYI, I use this approach for both an Interface Definition Language, from 
which I generate portions of my API document (and both sides of the 
interface), as well as the embedded design documentation I mentioned, 
previously.  I'm using XSLT all around, and wish XalanC would support XSLT 
1.1.  I also dearly wish it had a command-line flag for specifying an SYSTEM 
id search path (for external entities and DTD subsets), similar to the '-I' 
option supported by most C/C++ compilers!!


Matt


_________________________________________________________________
Join the world’s largest e-mail service with MSN Hotmail. 
http://www.hotmail.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]