This is the mail archive of the xsl-list@mulberrytech.com mailing list .
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
RE: Content constructors and sequences

From: "Michael Kay" <michael dot h dot kay at ntlworld dot com>
To: <xsl-list at lists dot mulberrytech dot com>
Date: Wed, 9 Jan 2002 11:29:36 -0000
Subject: RE: [xsl] Content constructors and sequences
Reply-to: xsl-list at lists dot mulberrytech dot com
Thanks for producing this carefully thought-out proposal, I suggest you
submit it to xsl-editors as-is. I know there are people on the group who
wanted to spec to take this kind of direction, and your document does
provide answers to some of the problems with the approach. Personally, my
inclination is to put sequence-handling functionality on the XPath side of
the boundary rather than the XSLT side, because I think you get better
compositionality that way, and better commonality between XSLT and XQuery,
but other solutions are certainly possible.

On the question of "rootless nodes", (more correctly, "documentless nodes"),
I have concerns about mutability. If a variable $v references such a node,
then when the node is added to a document, the value of
count($v/ancestor::*) is going to change. This suggests we are going to get
into problems of defining sequential order of execution. Similarly, adding a
node to a document is likely to change its position in document order
relative to other nodes, which means that sequences that were in document
order are no longer in document order. My worries may be misplaced (the data
model suggests that when a node is added to a tree, you logically create a
copy), but I didn't want to support rootless nodes in XSLT until we were
confident that there were no nasties in this area (or until we could
identify any benefits).

You might be interested that one of the reasons that XSLT 1.1 was pulled was
because some members of the WG had this kind of processing model in mind,
and saw that it could create backwards compatibility problems with the
RTF=nodeset issue. (For all I know, it was because James Clark was thinking
this way that the RTF restrictions were present in XSLT 1.0 in the first
place...)

Mike Kay

> -----Original Message-----
> From: owner-xsl-list@lists.mulberrytech.com
> [mailto:owner-xsl-list@lists.mulberrytech.com]On Behalf Of
> Jeni Tennison
> Sent: 09 January 2002 08:55
> To: xsl-list@lists.mulberrytech.com
> Subject: [xsl] Content constructors and sequences
>
>
> Hi,
>
> I'd greatly appreciate comments on the following; I'll post to
> xsl-editors@w3.org and www-xpath-comments@w3.org if the comments here
> don't point out a glaring flaw.
>
> Please post if you think it's a good idea, as well as if you think
> it's a bad one, particularly if you can think of ways of improving the
> strength of the argument.
>
> Thanks,
>
> Jeni
>
> ---
>
> Executive summary
> -----------------
>
> Rather than XPath being continuously extended to allow it to do what
> XSLT can already do, XSLT should be modified to support the thing that
> it can't already do: sequence construction. This could be achieved by
> amending the definition of content constructors in XSLT 2.0 and
> introducing a new xsl:item instruction. This change would make XSLT
> more consistent and more usable.
>
>
> Contents
> --------
>
> 1.  Requirement
> 2.  Sequence constructors
> 3.  Producing simple typed values and existing nodes
> 4.  Impact on XPath
> 5.  Impact on function definitions
> 6.  Impact on variable bindings
> 7.  Allowing rootless nodes
> 8.  Impact on result tree generation
> 9.  Conclusions
> 10. References
>
>
> Requirement
> -----------
>
> Yesterday, David C. posted a message to www-xpath-comments@w3.org that
> described how XPath is restricted by the lack of a general
> variable-binding expression (let clause) [1].
>
> I think that the lack of a let clause restricts what's practical in
> XPath (even if it doesn't affect what's theoretically possible). For
> example, with the for expression, you have to reconstruct any sequence
> that you create within the for expression each time you use it, which
> probably isn't particularly efficient and leads to maintenance
> headaches. For example:
>
>   for $o in $orders
>   return if (count($o/item[(@price * @quantity) > 100]) > 5)
>          then do:something($o/item[(@price * @quantity) > 100])
>          else do:something-else($o/item[(@price * @quantity) > 100])
>
> The way around this is with functions, because then you can use
> xsl:variable to assign the variable:
>
>   for $o in $orders
>   return do:process-items($o)
>
> and:
>
> <xsl:function name="do:process-items">
>   <xsl:param name="order" />
>   <xsl:variable name="items"
>                 select="$order/item[(@price * @quantity) > 100]" />
>   <xsl:result select="if (count($items) > 5)
>                       then do:something($items)
>                       else do:something-else($items)" />
> </xsl:function>
>
> but it's hardly ideal.
>
> The same kind of problem occurs within an if expression within a for
> expression, when certain variables are relevant within one branch of
> the if and not in the other. For example:
>
>   if ($string and $keyword)
>   then if ((starts-with($string, $keyword) or
>             ends-with(substring-before($string, $keyword), ' ')) and
>            (not(substring-after($string, $keyword)) or
>             starts-with(substring-after($string, $keyword), ' ')))
>        then (substring-before($string, $keyword),
>              $keyword,
>              substring-after($string, $keyword))
>        else $string
>   else ()
>
> which could be managed with:
>
>   if ($string and $keyword)
>   then (for $before in substring-before($string, $keyword),
>             $after  in substring-after($string, $keyword)
>         return if ((not($before) or ends-with($before, ' ')) and
>                    (not($after) or starts-with($after, ' ')))
>                then ($before, $keyword, $after)
>                else $string
>   else ()
>
> but which would be much clearer (and more accurate, since you're not
> really iterating) as:
>
>   if ($string and $keyword)
>   then (let $before := substring-before($string, $keyword),
>             $after  := substring-after($string, $keyword)
>         if ((not($before) or ends-with($before, ' ')) and
>             (not($after) or starts-with($after, ' ')))
>         then ($before, $keyword, $after)
>         else $string
>   else ()
>
> Again, you could create a function to do the testing, but if we have
> to generate new functions every time we want to bind variables, we're
> going to have them coming out of our ears.
>
> It's certainly true that you could add a let clause to XPath; you
> could also add a where clause... and a sortby clause... and
> typeswitches... and even element constructors... but what you end up
> with is a replication of all the facilities of XSLT, but using a
> non-XML syntax, and stuffed inside XML attributes.
>
>
> Sequence constructors
> --------------------
>
> So I'd like to suggest an alternative. Instead of modifying XPath so
> that it can do all the things that XSLT can do plus construct
> sequences, why not modify XSLT so that it can construct general
> sequences rather than just node sequences?
>
> Doing this is (I *think*) simpler than it sounds. In XSLT 2.0,
> "content constructors" are defined as [2]:
>
>   "a sequence of nodes in the stylesheet that, when evaluated,
>    constructs and returns a sequence of new nodes suitable for adding
>    to the result tree. This sequence is referred to below as the
>    result sequence."
>
> If we modify that definition, so that "content constructors" don't
> necessarily return *nodes* (they should probably then be called
> "sequence constructors"):
>
>    a sequence of nodes in the stylesheet that, when evaluated,
>    constructs and returns a sequence. This sequence is referred to
>    below as the result sequence.
>
> We can amend the description of XSLT instructions in line with this:
>
> XSLT instructions then produce a sequence of zero, one, or more items
> as their result. These items are added to the result sequence. Some
> instructions, such as xsl:element, return a newly-constructed node
> (which may have its own attributes, namespaces, children, and other
> descendants); others, such as xsl:if, return items produced by their
> own nested sequence constructors.
>
> [There are a couple of incompatibility problems here that I think can
>  be handled; I'll come on to those later.]
>
>
> Producing simple typed values and existing nodes
> ------------------------------------------------
>
> All we need now is an element that can add a simple typed value or an
> existing node to the result sequence. This could be achieved with an
> xsl:item element:
>
>   <!-- Category: instruction -->
>   <xsl:item
>     select = expression
>     type = datatype>
>     <!-- Content: sequence-constructor -->
>   </xsl:item>
>
> The xsl:item element works similarly to variable-binding elements: it
> produces a sequence of items from either its select attribute or its
> content. This enables you to add simple typed values or existing nodes
> to a sequence.
>
> For example, the equivalent to the for expression that we looked at
> earlier would be:
>
>   <xsl:variable name="new-orders" type="item*">
>     <xsl:for-each select="$orders">
>       <xsl:variable name="items"
>                     select="item[(@price * @quantity) > 100]" />
>       <xsl:item select="if (count($items) > 5)
>                         then do:something($items)
>                         else do:something-else($items)" />
>     </xsl:for-each>
>   </xsl:variable>
>
> The $new-orders variable would have a value of a sequence of items.
>
>
> Impact on XPath
> ---------------
>
> Enabling XSLT to generate sequences will remove the requirement for
> XPath to support expressions that involve range variables. For
> example:
>
>   <xsl:variable name="join" type="xs:integer*"
>                 select="for $i in (1, 2),
>                             $j in (3, 4)
>                         return ($i, $j)" />
>
> could be done with:
>
>   <xsl:variable name="join" type="xs:integer*">
>     <xsl:for-each select="(1, 2)">
>       <xsl:variable name="i" select="." />
>       <xsl:for-each select="(3, 4)">
>         <xsl:variable name="j" select="." />
>         <xsl:item select="($i, $j)" />
>       </xsl:for-each>
>     </xsl:for-each>
>   </xsl:variable>
>
> [Of course a mapping operator would still be useful for simple cases.]
>
> It would also remove the requirement for the sort() function (from
> XSLT, and indeed named sort specifications altogether) or the adoption
> of the sortby clause from XQuery, since the existing xsl:sort can be
> used.
>
> For example, instead of:
>
>   <xsl:sort-key name="subtotal-sort">
>     <xsl:sort select="@price * @quantity" data-type="number"
>               order="descending" />
>     <xsl:sort select="@part-id" order="ascending" />
>   </xsl:sort-key>
>   <xsl:variable name="sorted-items"
>                 select="sort($items, 'subtotal-sort')" />
>
> you could do:
>
>   <xsl:variable name="sorted-items">
>     <xsl:for-each select="$items">
>       <xsl:sort select="@price * @quantity" data-type="number"
>                 order="descending" />
>       <xsl:sort select="@part-id" order="ascending" />
>       <xsl:item select="." />
>     </xsl:for-each>
>   </xsl:variable>
>
>
> Impact on function definitions
> ------------------------------
>
> Adding the xsl:item element allows us to get rid of the xsl:result
> element when defining functions. The xsl:function element's new syntax
> would be:
>
> <xsl:function
>   name = qname>
>   <!-- Content: (xsl:param*, sequence-constructor) -->
> </xsl:function>
>
> The xsl:function element would simply return the sequence produced by
> its content constructor.
>
> For example:
>
>   <xsl:function name="my:split-string">
>     <xsl:param name="string" type="xs:string" />
>     <xsl:param name="keyword" type="xs:string" />
>     <xsl:if test="$string and $keyword">
>       <xsl:variable name="before"
>                     select="substring-before($string, $keyword)" />
>       <xsl:variable name="after"
>                     select="substring-after($string, $keyword)" />
>       <xsl:item select="if (not($before) or
> ends-with($before, ' ')) and
>                            (not($after) or starts-with($after, ' '))
>                         then ($before, $keyword, $after)
>                         else $string" />
>     </xsl:if>
>   </xsl:result>
>
>
> Impact on variable bindings
> ---------------------------
>
> The current XSLT 2.0 WD states:
>
>   "[ERR030] Elements such as xsl:variable, xsl:param, xsl:message,
>    and xsl:result-document construct a new document node, and use the
>    result sequence returned by the content constructor to form the
>    children of this document node. In this case it is an dynamic error
>    if the result sequence contains namespace or attribute nodes. The
>    processor must either signal the error, or must recover by ignoring
>    the offending nodes. The elements, comments, processing
>    instructions, and text nodes in the node sequence form the children
>    of the newly constructed document node."
>
> I'll concentrate on variable-binding elements here (xsl:message and
> xsl:result-document are handled in the next section).
>
> Supporting the creation of sequences means that rather than create a
> new document node, variable-binding elements must bind the variable to
> the result sequence produced by their sequence constructor. This
> sequence must be able to contain all kinds of nodes.
>
> There is a backwards incompatibility here - if a variable is assigned
> a value through the content of the variable-binding element, then
> rather than conceptually holding the "root node of the result tree
> fragment" as in XSLT 1.0, the variable holds a sequence of items
> (nodes, assuming you're using the variable as in XSLT 1.0).
>
> Currently, when users get the string value of a result tree fragment,
> they get the string value of the *root node* of the result tree
> fragment - the concatenation of the string values of the text node
> descendants in the result tree fragment.
>
> On the other hand, when users get the string value of a sequence, they
> get the string value of the first item in the sequence.
>
> Therefore if you have:
>
>   <xsl:variable name="foo">
>     <element>A</element>
>     <element>B</element>
>   </xsl:variable>
>
> then string($foo) will give "AB" in XSLT 1.0 and just "A" in XSLT 2.0
> (if sequence constructors were supported).
>
> [I don't think that people get the string values of result tree
>  fragments that contain elements very often because it's rarely useful
>  to create a result tree fragment with internal structure and then
>  proceed to ignore that internal structure, but it does happen.]
>
> Another difference applies if people are used to using node-set()
> extension functions to convert variables to node sets. As there is no
> document node, addressing the items in the sequence does not involve
> stepping down to them.
>
> For example, given the above definition of $foo, the equivalent of the
> following in XSLT 1.0:
>
>   <xsl:for-each select="exsl:node-set($foo)/element">
>     ...
>   </xsl:for-each>
>
> is simply:
>
>   <xsl:for-each select="$foo">
>     ...
>   </xsl:for-each>
>
> [There's an argument that XSLT 2.0 shouldn't have to worry about
>  backwards compatibility with extension functions, but the node-set()
>  extension function is very widely used and is based on the
>  description of result tree fragments from XSLT 1.0.]
>
> These backwards compatibility issues could be resolved by having the
> type attribute on the variable-binding element determine the behaviour
> of the variable-binding element. If the type attribute is not present,
> then the variable-binding element creates a result tree (as described
> later), and the variable is bound to a new document node; if the type
> attribute is specified, then the variable is bound to the sequence.
>
> [This is similar to the role played by the separator attribute on
>  xsl:value-of.]
>
>
> Allowing rootless nodes
> -----------------------
>
> Section 3.1 of the XSLT 2.0 WD [3] states:
>
>   "The data model defined in [Data Model] allows a node to be part of
>    a tree whose root is a node other than a document node.
>
>   "Although such nodes may exist transiently during the course of XSLT
>    processing, every node that is processed by an XSLT stylesheet
>    (that is, a node that may be returned in the result of an
>    expression) will belong to a tree whose root is a document node."
>
> This will no longer be true. It will be possible to create sequences
> containing nodes that do not have a parent.
>
> I'm not certain why this restriction applies in XSLT, especially as it
> is not a restriction in the data model or in XQuery. There might be
> something here that causes problems for the whole
> sequence-generation-using-content-constructors idea, but I'm not sure
> what it would be.
>
> If the suggestion for retaining backwards compatibility with
> variable-binding elements is used, then if XSLT 2.0 is used like XSLT
> 1.0 (i.e. without type attributes on variable-binding elements, and
> without user-defined functions) it is still true that every node that
> may be returned in the result of an expression will belong to a tree
> whose root is a document node.
>
>
> Impact on result tree generation
> --------------------------------
>
> The final impact of this change is on result tree generation. This
> applies to the construction of the content of element nodes, principal
> result tree, secondary result trees, messages, and tree variables
> (those without a type attribute). It also applies, slightly
> differently, to the construction of comment, attribute, processing
> instruction, text and namespace nodes (which I'll call simple nodes
> so that I don't have to repeat their names constantly).
>
> Currently, content constructors construct a sequence of nodes, and
> this sequence of nodes can be made into a result tree by adding a
> parent node, or converted to a string to be used as the value of a
> simple node. Under certain circumstances, the presence of certain
> types of nodes in the node sequence is a recoverable dynamic error
> (e.g. attribute nodes when creating a document; element nodes when
> getting the string value for an attribute).
>
> If we had the more general sequence constructors, result trees would
> need to be constructed from sequences containing any mixture of simple
> typed values and nodes (both newly created (rootless) and pre-existing
> (rooted)), rather than those containing just newly created nodes.
>
> Pre-existing nodes can be differentiated from newly created nodes by
> the fact that they already have a parent, are already part of a tree,
> and are therefore not rootless. With pre-existing nodes, there are
> three options:
>
>  - the pre-existing node is (deep) copied, and replaced in the
>    sequence by the newly created copy (often inappropriate when
>    the sequence provides a value for a simple node)
>
>  - the pre-existing nodes is ignored
>
>  - the presence of a pre-existing node in a sequence that's used to
>    generate a result tree is a dynamic error, with one of the two
>    above options as a recovery action
>
> Similarly, there are three options for simple typed values:
>
>  - the string value of the simple typed value is used as the value
>    for a newly created text node, and replaced in the sequence by this
>    newly created text node (which would have to be concatenated with
>    surrounding text nodes)
>
>  - the simple typed value is ignored
>
>  - the presence of a simple typed value in a sequence that's used to
>    generate a result tree is a dynamic error, with one of the two
>    above options as a recovery action
>
> In both cases I think that it's reasonable to make it an error, with
> the creation of a node as a recovery action. Conceptually, the
> sequence could be treated in exactly the same way as currently after
> pre-existing nodes and simple typed values are substituted.
>
>
> Conclusions
> -----------
>
> If XPath were extended to be a usable method of generating sequences,
> it would end up replicating the variable assignment and flow control
> features that are already available within XSLT. While there is an
> argument for constructing a language that performs transformations
> without using XML syntax, that niche is already filled by XQuery. In
> addition, because XPaths are used within attributes in XSLT, XSLT with
> extended XPath will become a lot harder to read, write, and maintain
> than the equivalent XSLT instructions.
>
> Extending the concept of 'content constructors' to more general
> 'sequence constructors' and introducing an xsl:item element to add
> simple typed values and pre-existing nodes to this sequence gives XSLT
> the power to construct sequences of all descriptions. Rather than
> learning one language for constructing sequences of nodes and a
> different language with similar constructs for constructing other
> sequences, you will only have to learn one, unified, language.
>
>
> References
> ----------
>
> [1]
http://lists.w3.org/Archives/Public/www-xpath-comments/2002JanMar/0026.html
[2] http://www.w3.org/TR/xslt20/#dt-content-constructor
[3] http://www.w3.org/TR/xslt20/#rootless-nodes

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
Follow-Ups:
- Re: Content constructors and sequences
  - From: David Carlisle
- Re: Content constructors and sequences
  - From: Jeni Tennison
References:
- Content constructors and sequences
  - From: Jeni Tennison
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]