This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

RE: MSXML Whitespace handling


Jeni,

Your reply is well-written and well-researched, and it exposes a
simplification I made in my original mail.  The fact is that the MS DOM does
not parse the input (I was trying to simplify the discussion, but instead
caused confusion).  Instead, it's the MS XML Parser that actually parses the
input XML and makes SAX-like calls to the application to consume.  At this
level, full whitespace is provided to the "application" that is consuming
the stream of events.  And it is the MS DOM which consumes these events and
builds the in-memory representation, making it an application as defined in
the XML 1.0 spec.  Here is the architecture, represented graphically:

__________________
|                |
| XSL Processor  |
| (Application)  |
|________________|
       ^
_______|__________    ___________________________
|                |    |                         |
| DOM Cache      |--->| User Application        |
|________________|    | (may perform read-only  |
       ^              | operations on cache     |
_______|__________    | concurrently with XSLT) |
|                |    |_________________________|
| MS DOM Builder |
| (Application)  |
|________________|
       ^
_______|___________
|                 |
| MS XML Parser   |
| (XML Processor) |
|_________________|
       ^
_______|________
|              |
| XML Document |
|______________|


The user loads the DOM, with code like this:

    dom.load("my-xml.xml");

and if the user did not set:

    preserveWhiteSpace = true;

then there is absolutely no way for MS XSLT to recover the whitespace
stripped during load, since the application it depends upon (MS DOM) has
already stripped it.  The MS XSL processor has not even been instantiated
yet.  How can it reach back and instruct the DOM to preserve whitespace?

Do you see the problem?  Your mail made it sound like MSXSL somehow controls
the load.  It does not, nor should it be required to.  Instead, the user
controls the load.  If the user allows whitespace to be stripped, then there
is absolutely nothing that MS XSLT can do to recover it.

Now, I do see that defaulting to preserveWhiteSpace = true has caused a lot
of confusion to XSLT users, but remember that the decision to use this
default was made long ago, before the XSLT spec even existed.  I'll let the
UE guys know that they should prominently discuss preserveWhiteSpace = true
in the XSLT docs, so that people know how to get the behavior they want.

Here is a snippet of JScript that shows how to transform a fully preserved
cache:

    xml_dom = new ActiveXObject("MSXML2.DOMDocument);
    xsl_dom = new ActiveXObject("MSXML2.DOMDocument);

    xml_dom.preserveWhiteSpace = true;
    xsl_dom.preserveWhiteSpace = true;

    strResult = xml_dom.transformNode(xsl_dom);

The advantage to this architecture is that the user can load both the XML
and XSL, change either of them via the DOM API, and then perform the
transformation.  If XSLT directly loaded the .XML and .XSL, this would not
be possible.

~Andy Kimball
MSXSL Dev


-----Original Message-----
From: Jeni Tennison [mailto:jeni@friday.u-net.com]
Sent: Tuesday, August 01, 2000 5:04 PM
To: xsl-list@mulberrytech.com
Subject: RE: MSXML Whitespace handling


At 13:51 01/08/00 -0700, Andrew Kimball wrote:
>As for mangling by default, that is a beef with the design of the MS DOM,
>not with the conformance of MS XSL.  The MS DOM defaults towards
performance
>and low memory consumption, while still staying within the XML 1.0 spec.  I
>think it was the right decision for the vast majority of users.  Users who
>need to preserve whitespace can always set preserveWhiteSpace=true when
>loading the DOM, or use xml:space="preserve" to tag significant whitespace.

As Andy says, it is a beef with the design of the MS DOM rather than MS XSL.

From a standards point of view, it all comes down to whether MS DOM is
counted as an XML processor or an XML application.  The XML Recommendation
states:

"A software module called an XML processor is used to read XML documents
and provide access to their content and structure. It is assumed that an
XML processor is doing its work on behalf of another module, called the
application. This specification describes the required behavior of an XML
processor in terms of how it must read XML data and the information it must
provide to the application."

Andy said:

"The application responsible for parsing the input XML and
building the tree cache is the DOM, not XSLT.  Therefore, it is perfectly
reasonable to view the DOM as the "application" referred to in the XML 1.0
spec."

It seems the job of MS DOM is to read in (parse) and provide access to the
content and structure of the XML document: squarely in the preserve of the
'XML processor' rather than the 'XML application'.  (If that's not the
case, how does MS DOM *apply* the information in the XML document as a
standalone application?)  It seems to me that it is MS XSL that actually
performs some action as a result of the XML: MS XSL is an XML application,
MS DOM is an XML processor.

In the section on Whitespace Processing (2.10) the XML Recommendation
states:

"An XML processor must always pass all characters in a document that are
not markup through to the application. A validating XML processor must also
inform the application which of these characters constitute white space
appearing in element content."

Given that MS DOM is an XML processor, it should be passing the whitespace
within xsl:text through to MS XSL so that it can deal with it properly.

From a usability point of view, in my experience one of the main uses of
xsl:text is to add whitespace in some output.  I'm sure that it makes MS
DOM quicker and leaner not to worry about whitespace, but it seriously
detracts from its utility as a XML processor to be used by an XSLT
Processor like MS XSL.

If there was a normative XSLT DTD, and the XSLT DTD specified:

<!ATTLIST xsl:text
  xml:space	(preserve)	#FIXED	'preserve'>

then presumably MS DOM would preserve the whitespace within xsl:text.

As it is, the DTD that is supplied within the XSLT Recommendation is
non-normative and I imagine that most XSLT processors decide what to do on
the basis of an implicit understanding of the intention behind the
definitions given within the XSLT Recommendation rather than relying on an
explicit DTD.  It is clearly the intention within
[http://www.w3.org/TR/xslt#strip] that xsl:text should preserve whitespace;
XML applications that deal with XSLT should treat these elements as if they
had xsl:space="preserve" declared on them.  As a compromise, could MS DOM
treat xsl:text as if xml:space="preserve" were defined on it?

Perhaps unfortunately, because it would be nice if a small compromise were
all that's needed, the rules governing whether whitespace is significant
within XSL elements is more complex that whether an element has
xml:space="preserve" or even whether it's an xsl:text element.  In XSLT,
you can define elements within which whitespace should be preserved using
xsl:preserve-space (in combination with xsl:strip-space).  If MS XSL is not
given sufficient information to process these elements according to the
XSLT Recommendation, then these elements are useless when used with it.  A
larger compromise would involve MS DOM treating all mixed-content and
#PCDATA XSLT elements as if xml:space="preserve" were defined on them.

However, for true compliance as a XML processor, to avoid spurious
exceptions for XSLT elements, and to enable MS XSL (and, eventually, other
XML applications) to perform in a useful and compliant manner, MS DOM
should preserve whitespace by default.  If MS DOM does not, MS XSL should
use a conformant XML processor instead, to enable it to conform to the XSLT
Recommendation.

My 10p worth :)

Cheers,

Jeni

Dr Jeni Tennison
Epistemics Ltd * Strelley Hall * Nottingham * NG8 6PE
tel: 0115 906 1301 * fax: 0115 906 1304 * email:
jeni.tennison@epistemics.co.uk


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]