This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Regular expression functions (Was: Re: comments on December F&O draft)


Hi Marc,

> assume we have some :z: == (c1)(:x:){2} then the selection of index
> x[2] would have no meaning, since there is only one x noted in the
> regex
>
> and in normal regex behavior the numbered index 2 (2nd parenthesis) will
> only hold the second occurence of the :x: matching part of the string... it
> is as writing (c1):x:(:x:)

That's an interesting point. Assuming x matched 'c2', then that would
mean a structure of:

  <z>
    <rxp:match>c1</rxp:match>
    c2
    <x>c2</x>
  </z>

> this is how regexes are working I'm afraid... (other hand, the
> notations :z: == (c1)(:x:)(:x:) and/or :z: == (c1)((:x:){2}) would
> possibly tackle what you really need)

Yes - with the second of these, you would get something like:

  <z>
    <rxp:match>c1</rxp:match>
    <rxp:match>
      c2
      <x>c2</x>
    </rxp:match>
  </z>

which would at least allow you to get the result of the two xs
combined.

> oh and by the way, I started of this :subregex: notation, based on bad
> memory of long-past perl days
> just opened some doc again, and understand now that it used to be the
> [:name:] notation for the posix characters... with added possible stuff like
> [:^name:] and the like

Hmm... Perl uses that notation for named character classes. The
equivalent in the XML Schema regular expression language is roughly:

  \p(name)     (characters in the named class)
  \P(name)     (characters not in the named class)

That's a different kind of thing to what we're doing here (where the
named expressions are complete regular expressions rather than
character classes). I'd be tempted to introduce a different escape
character to do it, for example e (for expression):

  \e(name)     (the named subexpression)
  \E(name)     (not the named subexpression, if that's appropriate?)

So something like:

  \e(mantissa)\e(exponent)?
  
> revoking my own introduction: maybe $name makes more sense in any
> case?

Using $name in the regular expression might be confusing - you'd need
to make sure you could detect the end of the name, so probably ($name)
would be better. (I think that if $ is introduced as matching the end
of the string then you could safely state that it only matched the end
of the string if it was at the end of the regular expression.)

So something like:

  ($mantissa)($exponent)

I'd suggest {$name}, but only if regular expression support wasn't
ever available through functions (because {$name} looks a lot like an
AVT, and would make people think that they could put AVTs in
attributes that held expressions).
  
If the references look like variable references then they should
probably be set with variable-binding elements (e.g. xsl:variable).

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]