This is the mail archive of the
docbook-apps@lists.oasis-open.org
mailing list .
Re: Normalizing spaces?
Norman asked, in reply to my question:
At 12:53 PM 5/8/01 -0400, you wrote:
/ "M. Wroth"
<mark@astrid.upland.ca.us> was heard to say:
| Hmmm. The behavior I asked about was the
"normalization" (possibly
| not the right word) of spaces in element content
(specifically,
| although not limited to, <para> elements), in the SGML version
of
| DocBook processed with the Modular DSSSL Style Sheets.
The odd thing about your question is the example. SGML systems
discard
whitespace from element content (the content of chapter, book,
procedure, etc.), but not from mixed content. A paragraph is mixed
content, so DSSSL shouldn't be throwing away spaces.
| I observe that multiple whitespace characters (newline, tab, and
| space) show up in the output file as a single space character.
This
| behavior is not universally present (i.e. in other DTD and other
style
| sheets, such concatenation is not observed). I'm
trying to
| understand how this works.
Can you identify some specific content models in other DTDs where
the
effect is different. And what other tools?
Here is an example of content:
<p>It is not permissible under the Society's rules to
fimbriate a chief.
Laurel precedent (Laurel Alison, Dec 86
and Aug 88)
<q>however this is blazoned, in appearance it
includes a fimbriated
chief, which is not permitted for
Society usage</q>.
RFS VIII.3 limits fimbiration to simple
geometric charges placed
in the center of the field; while a
chief is a simple
geometric charge, it is not in the center
of the field.
</p>
Note that the indentation is achieved by space characters in the
input file (put there by PSGML/emacs).
The DTD content model is
<!element p
o o (#PCDATA
| blazon | q | bk
| bq | sa | cite)* +(cite)>
<!ATTLIST P INCLUDEIN (MIN|LOI|BOTH|IGNORE) BOTH>
and I'm processing it with a homegrown DSSSL style sheet run
through Jade
C:\USR\DSSSL\JADE\JADE.EXE:I: Jade version "1.2.1"
C:\USR\DSSSL\JADE\JADE.EXE:I: SP version "1.3.3"
but the behavior is identical with
C:\USR\DSSSL\OPENJA~1.3\BIN\OPENJADE.EXE:I: OpenJade version
"1.3"
C:\USR\DSSSL\OPENJA~1.3\BIN\OPENJADE.EXE:I: OpenSP version
"1.3.4"
The processing rule for a <p> is long and complicated, but doesn't
explicitly do anything to get rid of spaces; ultimately the content
is processed with process-children.
(element p (make sequence
(if (have-ancestor? "MINUTES")
(make sequence
(case (attribute-string
"INCLUDEIN")
(("MIN" )
(if (first-sibling?)
(make paragraph (process-children))
(make paragraph ;The "else" clause
first-line-start-indent: 12pt
(process-children)
)
) )
(("LOI"
)(empty-sosofo))
(("BOTH" )
(if (first-sibling?)
(make paragraph (process-children))
(make paragraph ;The "else" clause
first-line-start-indent: 12pt
(process-children)
)
) )
(("IGNORE"
)(empty-sosofo))
((#f)
(if (first-sibling?)
(make paragraph (process-children))
(make paragraph ;The "else" clause
first-line-start-indent: 12pt
(process-children)
)
) )
)
); end of the ``THEN'' clause
(make sequence
(case (attribute-string
"INCLUDEIN")
(("MIN"
)(empty-sosofo))
(("LOI"
)
(if (first-sibling?)
(make paragraph (process-children))
(make paragraph ;The "else" clause
first-line-start-indent: 12pt
(process-children)
)
) )
(("BOTH" )
(if (first-sibling?)
(make paragraph (process-children))
(make paragraph ;The "else" clause
first-line-start-indent: 12pt
(process-children)
)
) )
(("IGNORE"
)(empty-sosofo))
((#f)
(if (first-sibling?)
(make paragraph (process-children))
(make paragraph ;The "else" clause
first-line-start-indent: 12pt
(process-children)
)
) )
)
)
)
)
)
(I wouldn't object to suggestions for better ways to do this --
but it's old code and works fine, so I'm not particularly looking for
ways to improve it, other than this question; it will likely get replaced
when the underlying DTD gets changed in the not too distant
future).
Mark B. Wroth
<mark@astrid.upland.ca.us>