This is the mail archive of the davenport@berkshire.net mailing list for the Davenport project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: DAVENPORT: tables

To: davenport@berkshire.net
Subject: Re: DAVENPORT: tables
From: Norman Walsh <ndw@nwalsh.com>
Date: Fri, 9 Jul 1999 09:04:15 -0400
References: <m112XhK-001MCeC@gliep>
Reply-To: davenport@berkshire.net

/ Joerg Wittenberger <Joerg.Wittenberger@pobox.com> was heard to say:
| I've just being asked about the mixed content type problem which
| disallows spaces between paragraph (and other) tags within table
| "entry"s.
| 
| From the DTD I read:
| 
| <!ENTITY % paracon '#PCDATA' -- default for use in entry content -->
| <!ENTITY % tbl.entry.mdl        "(para|warning|caution|note|legend|%paracon;)*">
| <!ELEMENT entry - O (%tbl.entry.mdl;) %tbl.entry.excep; >
| 
| Now I'm sort of lost.  When I try to explain the problem.  smth like

First off, you're looking in the wrong place. The tbl.entry.mdl parameter
entity is redefined in dbpool.mod. Here are the relevant bits:

<!ENTITY % tabentry.mix
		"%list.class;		|%admon.class;
		|%linespecific.class;
		|%para.class;		|Graphic|MediaObject
		%local.tabentry.mix;">

<!ENTITY % para.char.mix
		"#PCDATA
		|%xref.char.class;	|%gen.char.class;
		|%link.char.class;	|%tech.char.class;
		|%base.char.class;	|%docinfo.char.class;
		|%other.char.class;	|%inlineobj.char.class;
		|%synop.class;
		|%ndxterm.class;
		%local.para.char.mix;">

<!ENTITY % tbl.entry.mdl "((%tabentry.mix;)+ | (%para.char.mix;)+)">

| <entry><para>...</para> <para>...</para></entry>
| 
| should work to my understanding.  The element definition says
| repeat "one out of para, ..., #PCDATA".  What I wrote is
| "para, #PCDATA, para".

Look closely, that's not quite what it says. What it says is

  (repeat one out of tabentry.mix) OR (repeat one out of para.char.mix)

It does not say 

  repeat (one out of tabentry.mix or one out of para.char.mix)

| But this explanation must be wrong.  At least the parser says so.
| Could someone tell me where I'm wrong?

The root of the problem is that the parser has to make the
choice with only one token of lookahead.  Here's an explanation
that I've got in this, um, book, um, that I've got, um, lying
around on my hard disk ;-)

<note>
<title>Pernicious Mixed Content</title>
<para>The content model of the <sgmltag>Entry</sgmltag> element exhibits a
nasty peculiarity that we call &ldquo;pernicious mixed content&rdquo;<footnote>
<para>This term was coined by Terry Allen.</para>
</footnote>.</para>
<para>Every other element in DocBook contains either block-elements or inline
elements (including &pcdata;) unambiguously.  In these cases, the meaning
of line breaks and spaces are well understood; they are insignificant between
block elements and significant (to the &SGML; parser, anyway) where inline
markup can occur.</para>
<para>Table entries are different; they can contain either block or inline
elements, but not both at the same time.  In other words, one <sgmltag>Entry
</sgmltag> in a table might contain a paragraph or a list while another contains
simply &pcdata; or other inline markup, but no single <sgmltag>Entry</sgmltag>
can contain both.</para>
<para>Since the content model of an <sgmltag>Entry</sgmltag> allows both kinds
of markup, each time the &SGML; parser encounters an <sgmltag>Entry</sgmltag>,
it has to decide what variety of markup it contains.  &SGML; parsers are forbidden
to use more than a single token of lookahead to reach this decision. In practical
terms, what this means is that a line feed or space after an <sgmltag>Entry
</sgmltag> start tag causes the parser to decide that the cell contains inline
markup.  Subsequent discovery of a paragraph or other block element causes
a parsing error.</para>
<para>All of these are legal: 
<screen>
<![CDATA[
<entry>3.1415927</entry>
<entry>General <emphasis>#PCDATA</emphasis></entry>
<entry><para>
A paragraph of text
</para></entry>
]]>
</screen>
</para>
<para>but each of these is an error  <screen>&lt;entry>                <lineannotation>
Error, cannot have a line break before a block element</lineannotation>
<![ CDATA [<para>
A paragraph of text.
</para></entry>
</para>

<para>
<entry><para>
A paragraph of text.]]>
&lt;/para>               <lineannotation>Error, cannot have a line break between block elements
</lineannotation>
<![ CDATA [<para>
A paragraph of text.
</para></entry>
</para>

<para>
<entry><para>
A paragraph of text.]]>
&lt;/para>               <lineannotation>Error, cannot have a line break after a block element
</lineannotation>
<![ CDATA [</entry>]]></screen></para>

<para>In designing a &DTD;, it is wise to avoid pernicious mixed
content.  Unfortunately, the only way to correct the pernicious
mixed content problem that already exists in DocBook, would be
to require some sort of wrapper (a block element, or an inline
like <sgmltag>Phrase</>) around &pcdata; within table
<sgmltag>Entry</>s.  This would be annoying and inconvenient in
a great many tables where &pcdata; cells predominate and, in
addition, would be different from
<acronym>CALS</acronym>.</para>
</note>

                                        Cheers,
                                          norm

-- 
Norman Walsh <ndw@nwalsh.com>      | The Lord's Prayer is 66 words, the
http://www.oasis-open.org/docbook/ | Gettysburg Address is 286 words,
Member, DocBook Editorial Board    | there are 1,322 words in the
                                   | Declaration of Independence, and
                                   | government regulations on the sale
                                   | of cabbage exceed 26,900 words.

References:
- DAVENPORT: tables
  - From: Joerg Wittenberger <Joerg.Wittenberger@pobox.com>

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]