This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: XPath grammar questions


Hi Sean,

> I've written the XPath parser three times already; this 
> fourth time, I broke down and just implemented a lexer (more or less)

> conforming to the XPath grammar.  It works more or less properly, but
> I have a couple of places where it breaks down, and if there are any
> XPath gurus who can tell me how I'm misunderstanding the XPath spec,
> I'd appreciate the feedback.
> 
> The first case is in a path submitted by Tobias Reif, that 
> originated, as I recall, from someone on this list:
> 
>  *[* and not(*/node()) and not(*[not(@style)]) and not(*/@style != 
> */@style)]
> 
> Specifically, it's the 'not(*/node())' that I'm having trouble with. 
> The XPath spec states that:
> 
>   not( boolean ) -> boolean
> 
> This would imply that '*/node()' evaluates to a boolean.  However, it
> also states that paths such as:
> 
>   ancestor::node()
> 
> evaluates to a set of matching nodes.  Further, I had assumed that 
> the path:
> 
>   */node()
> 
> by itself would also result in a set of nodes.
> 
> I have a group of theories about this, but I'm not quite grokking the
> intent of XPath.  I don't see how the same path should evaluate to 
> two different results.  In any case, there have been a number of 
> successful implementations of XPath, so I know I'm missing something.

>From the spec:
http://www.w3.org/TR/xpath#section-Boolean-Functions

"The boolean function converts its argument to a boolean as follows:

a number is true if and only if it is neither positive or negative zero
nor NaN

a node-set is true if and only if it is non-empty

a string is true if and only if its length is non-zero

an object of a type other than the four basic types is converted to a
boolean in a way that is dependent on that type

Function: boolean not(boolean) 

The not function returns true if its argument is false, and false
otherwise."

What this means is if a node-set is passed as argument to not(), it is
first converted to boolean by using the rules for the boolean()
function above.

So:
  not(expression) = not(boolean(expression))

In this specific case not(node-set) will be true only if the node-set
is the empty node-set.

> The second (and at this point, more critical) problem I'm having is 
> with function names.  Take:
> 
>   [normalize-space(@name)='x']
> 
> If you follow the grammar, the evaluation is:
> 
>    Predicate->Expr->OrExpr->AndExpr->EqualityExpr->RelationalExpr->
>    AdditiveExpr
> 
> at which point it matches the rule:
> 
>   AdditiveExpr:: AdditiveExpr '-' MultiplicativeExpr
> 
> where you effectively have "normalize" "-" "space(@name)='x'".  What 
> my code does at this point is hang; 'normalize' gets caught in an 
> endless, recursive evaluation loop.  The only way I think I can solve

> this at this point is for checking for endless recursion.  
> 

There's another rule -- for QName:

http://www.w3.org/TR/REC-xml-names/

and it uses NCName for the prefix and the local part.

The rule for NCName is:

[4]  NCName ::=  (Letter | '_') (NCNameChar)* /*  An XML Name, minus
the ":" */ 
[5]  NCNameChar ::=  Letter | Digit | '.' | '-' | '_' | CombiningChar |
Extender 


So, "-" is a legitimate character in every QName.

The rule for function names uses QName as well.

In order to perform correctly, any lexical analizer should match the
names greedily -- that is, should return the longest string that
matches a particular rule. 

In your case, the lexer should return NCName for "normalize-space", and
not NCName "-" NCName.

It is a common mistake in XSLT/XPath to write expressions as $var-4 and
to complain that this was not parsed and evaluated as $var - 4


Cheers,
Dimitre Novatchev.

__________________________________________________
Do You Yahoo!?
Yahoo! Sports - live college hoops coverage
http://sports.yahoo.com/

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]