This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Content constructors and sequences


Hi Kevin,

> I like the ideas in this, but (isn't there always one?) as I think
> other people have said the tree or sequence representation is
> difficult.

You mean managing which is used when a sequence is assigned to a
variable (does it create a new tree [as in XSLT 1.0] with the variable
assigned to the document node of that tree, or is the variable simply
assigned to the sequence that is created from its content)?

> Anyway, I am probably missing something important but can't you just
> create a tree using the new semantics of copy-of on a sequence of
> simple values. The only disadvantage I can see is if you want to
> really have to a sequence of simple values you will need to convert
> it back from a tree but that doesn't sound like it would be
> difficult. Particularly if processors are smart enough to retain the
> data in its native format as it has been suggest they might.

There are a few things that make it not as straight-forward as you
might think - which is why XSLT as it stands from being adequate for
manipulating sequences (and why we're getting all this functionality
from XSLT being added to XPath).

First, what does it mean to copy a simple typed value into a tree (by
tree I'm assuming you're specifically meaning node trees as in the
XPath data model, not a more general 'tree' data type that, since
XPath doesn't have that concept)?

If it means creating a new type of node, called simple typed value
nodes, then it could possibly work, but this would mean extending the
XML Infoset.

If it means creating text nodes from each of those simple typed value,
then you lose separating between the simple typed values because (as
you know) text nodes get combined together when a document is created.
You might be able to separate out the values in some cases (assuming
that simple typed values get whitespace added around them when they
are converted to text nodes), but not if some of the simple typed
values were strings with spaces in them. Stopping text nodes from
being merged would be one possibility, so that each simple typed value
existed in its own text node.

The second problem is more significant. One of the features of node
sets (and node sequences), what enables us to work with them in fact,
is that they do not contain *copies* of the nodes, but the nodes
themselves. This means that you can take a node from a sequence and
find out its ancestors and siblings in the document that it came from
- those don't change when you put it in a sequence.

It is hard to see how this behaviour could be replicated if there were
only trees (documents) in XSLT. One method that I think the WG
considered (given that it was in an earlier draft of the data model)
was introducing a new type of node - a reference node - which could
stand in for the node itself. I'm not exactly certain why it was
dropped; again it would be an extension to the XML Infoset.

It would certainly be interesting to see a proposal that made these
two extensions to the the data model - added simple typed value nodes
and reference nodes. This would enable documents to hold simple typed
values and references to nodes from other documents, which would mean
XSLT could do the same "sequence manipulation" as XPath has to now.

[I suspect that there would be strong resistance from XQuery, since
these new types of nodes would be added at a fundamental level, to a
shared data model, despite the fact that they're only required in
XSLT.]

> I would like to see sequences in XSLT, but I don't think putting
> them in along side trees is the natural approach. More as an
> integral part of the tree structure.

We're used, in XSLT, to seeing documents (trees) as the basic
structure in which information is held. Just to give a quick review of
the data model in XPath/XSLT 1.0

 - the first class objects are strings, numbers, booleans, node sets
   and result tree fragments
 - node sets contain nodes (which are not first-class objects)
 - nodes have various properties, including children - a node set (the
   order of the children can be worked out from the nodes' document
   order)
 - there are conceptually two kinds of node sets:
   - node sets containing new nodes (result trees), which can only be
     generated using XSLT
   - node sets containing existing nodes, which can only be generated
     using XPath

There are several problems with this data model:

 - there's an enforced division between the two types of node set:
   because you can't use XSLT to create a node set containing existing
   nodes, you can only construct those node sets using the relatively
   limited functionality of XPath

 - there's no way of natively representing a list of strings or
   numbers

 - there's a very restricted set of data types (no dates, for example)

The new data model tries to address those problems and (I think) in
the process rationalises some of the weirdness of the old data model:

 - the first class object type is a sequence (look, like LISP!)
 - sequences contain items of two types: simple typed values or nodes
 - simple typed values can be of various kinds, the XML Schema
   datatypes
 - nodes have various properties, including children (a sequence of
   nodes)

As currently designed, the old division between node sets containing
new nodes and node sets containing existing nodes is being imported
into XSLT: there's a division between sequences that contain new
nodes, which can only be generated using XSLT, and sequences that
contain existing nodes, which can only be generated using XPath.

What's more, because the sequence is the primary type in the new data
model, generating sequences of simple typed values is going to be much
more important in XPath/XSLT 2.0.

One way of helping is to provide the facility for people to write
XPath functions in XSLT, which you can now do with xsl:function.
However, this leads to code being spread out amongst lots of functions
(since basically the only thing from XSLT that functions give you
access to is variable assignment), and is overkill for the simple
things.

So we need something else to give powerful sequence manipulation
without ducking out to functions. There are two options (as I see it):

 - add more programming constructs to XPath to make it more capable so
   that it has sufficient constructs for manipulating node sequences
   containing existing nodes (and other items!) more easily
   
 - enable XSLT to produce sequences containing things other than new
   nodes

To go back to your comment:

> I would like to see sequences in XSLT, but I don't think putting
> them in along side trees is the natural approach. More as an
> integral part of the tree structure.

Hopefully what I've described above about the way the data model has
changed shows that actually sequences are the basic building block of
XSLT now, not tree structures. If you start seeing XSLT as building
sequences of new nodes, then the step to building sequences containing
other items as well isn't very far.

The biggest stumbling block, as we've been discussing, is variables,
because variables implicitly construct a document node whose children
are set to the new nodes generated by the content of the variable,
rather than just being set to the sequence of new nodes. If that can
be resolved, I think everything fits together rather neatly.

[I also think that it gives a neat parallel between XQuery and XSLT -
XQuery constructs sequences without using XML syntax; XSLT constructs
sequences with XML syntax. You can view XQuery as the non-XML version
of XSLT and XSLT as the XML version of XQuery (putting aside the
fundamental differences in approach, demonstrated by comparing
xsl:for-each and FLWR expressions).]
   
Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]