This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: XPath over DOM
- To: xsl-list at lists dot mulberrytech dot com
- Subject: Re: [xsl] XPath over DOM
- From: Uche Ogbuji <uche dot ogbuji at fourthought dot com>
- Date: Fri, 16 Feb 2001 08:27:31 -0700
- Reply-To: xsl-list at lists dot mulberrytech dot com
> > In abstract
> > terms, it's a single tree traversal, with one assignment per node.
>
> That's one way of doing it, but it means you have to have somewhere to put
> the sequence number. You can't put it in the DOM objects themselves, so you
> have to create a static wrapper, which involves creating more objects:
> hopefully not one per node, otherwise you might as well rebuild the tree.
> The approach Xalan an xt use (I believe) is to do the document order
> comparison dynamically, by finding the lowest common ancestor of two nodes.
Ah. I see. I guess this is the advantage of Python's dictionaries
(associative arrays). It makes such external decoration quite trivial, and
quite efficient. Perl folks gain the same advantage, and C folks can do so
with a simple hash table.
> > > (b) skipping over and counting nodes correctly in the
> > presence of things
> > > such as entity reference nodes, CDATA nodes, and
> > unnormalized text nodes,
> > > and
> >
> > There is a normalize() if the user doesn't mind mutation.
>
> Mutation of the supplied tree, I think, is out of the question. (This also
> makes whitespace stripping much more difficult - another thing I forgot to
> mention.)
Then you'd have to wrap with internal indices. More complex.
> Incidentally, MSXML3 gets this wrong: using CDATA gives you multiple
> adjacent text nodes. I think that's evidence that it's not easy: and they
> have the advantage that they only work with their own DOM implementation.
No one says it's easy. It it takes time and experimentation.
> > The rest, at least
> > as I've attacked it, is a matter of wrapping, again in the
> > same pass as doc-order indexing.
>
> I'm thinking of doing it (eventually) in Saxon by dynamic wrapping using
> flyweight objects, in the same way as the Saxon "tinytree" currently works.
> >
> > > (c) dealing with the multitude of ways that the DOM allows namespace
> > > nodes to be (or not be) represented.
> >
> > ??? Do you mean Level 1 vs. Level 2?
>
> That's part of the issue. Element and attribute names in the DOM can contain
> a namespace URI, the namespace URI may or may not be present in an xmlns:xxx
> Attr node. The set of namespace nodes, as far as I can see, is the union of
> namespaces that are used in element and attribute nodes plus namespaces that
> are declared in xmlns:xxx pseudo-attributes, in the current element or in
> any ancestor.
Oh. We sort this out on our scan pass. The algorithm is pretty simple,
actually.
But most of what we do takes advantage of dictionaries, which helps a lot.
--
Uche Ogbuji Principal Consultant
uche.ogbuji@fourthought.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list