This is the mail archive of the xsl-list@mulberrytech.com mailing list .

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Regular expression functions (Was: Re: comments on December F&O draft)

From: Jeni Tennison <jeni at jenitennison dot com>
To: xsl-list at lists dot mulberrytech dot com
Date: Mon, 14 Jan 2002 10:11:51 +0000
Subject: Re: Regular expression functions (Was: Re: [xsl] comments on December F&O draft)
Organization: Jeni Tennison Consulting Ltd
References: <000101c19cb1$d8f342d0$2100a8c0@swiftnet.tec>
Reply-to: xsl-list at lists dot mulberrytech dot com

Chris,

> I've been a bit tied up with one thing and another (and I think you
> might have discussed this before) but aren't regex matches just
> predicates on text nodes ala
> <xsl:template match="text()['\(.*\)']">
>         <x><xsl:apply-templates select=".[1]" /></x>
> </xsl:template>
> Which applies templates to whatever is not matched (child texts) (but
> which matches the template).

Not all strings that you might deal with are text nodes, so I think
that you need to provide something that allows you to match other
strings as well. Indeed, your example above demonstrates this - when
you do .[1], then presumably you're applying templates to the
matched substring of the current text node. I think that there are
three possibilities:

  - assume that when you apply templates to a string, it's
    automatically converted to a text node, and apply templates to
    that
  - open up normal templates so that they can match things other than
    nodes
  - introduce specific regexp templates

> So that template on a text node
> "(a(b(c)d)e)" (assuming greedy)would produce
> <x>
>   a 
>   <x>
>     b
>     <x>
>      c
>     </x>
>     d
>   </x>
>   e
> </x>

Unfortunately, assuming greedy, (a)(b) would produce:

  <x>a)(b</x>

which is probably not what you want. This is why I suggested the
bracket-balancing tokenize() function. For example, you'd have:

  <xsl:apply-regexp-templates select="'(a(b(c)(d))e)'" />

and then:
  
<xsl:regexp-template match="\((.*)\)">
  <x>
    <xsl:apply-regexp-templates
      select="tokenize(current-match()[1], '\(', '\)')" />
  </x>
</xsl:regexp-template>

would give:

 <x>a<x>b<x>c</x><x>d</x></x>e</x>

> Maybe it's rubbish but it doesn't look too alien to me. What other
> useful predicates can you put on a text node?

Commonly, I'd guess:

  text()[1]
  text()[normalize-space()]
  text()[starts-with(., 'foo')]
  text()[contains(., 'foo')]

The second one is the one that would clash with what you're suggesting
(where any string used as the predicate to a text node acts as an
implicit regexp test on the value of the text node).

But you could always have a test() function that does the test
explicitly instead:

  text()[test('\(.*\)')]

Or the other option is to have a special syntax to refer to a regular
expression, or even to make regular expressions first class objects.

> Surely it isn't going to clash with anything. There are nearly 1000
> pages of wd's to look at here so looking at it another way is there
> anything that says that . can't be a sequence and that I can't index
> into it with .[x]?

. is defined as being the context item (or a singleton sequence
containing the context item, depending on how you want to view it), so
logically .[2] should never return anything. Currently, as in XPath
1.0, . is an abbreviated step and cannot take any StepQualifiers
(which includes predicates).

The way I (and I think David) was thinking, you'd use current-match()
or some other function to get information about the subexpression
matches when you were inside the template. So perhaps:

  current-match()[x]

rather than .[x].

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Follow-Ups:
- RE: Regular expression functions (Was: Re: comments on December F&O draft)
  - From: Chris Bayes

References:
- RE: Regular expression functions (Was: Re: comments on December F&O draft)
  - From: Chris Bayes

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]