This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
RE: Higher-Order Functions in XPath 2.0
- From: "Michael Kay" <michael dot h dot kay at ntlworld dot com>
- To: <xsl-list at lists dot mulberrytech dot com>
- Date: Wed, 16 Jan 2002 15:27:45 -0000
- Subject: RE: [xsl] Higher-Order Functions in XPath 2.0
- Reply-to: xsl-list at lists dot mulberrytech dot com
Great stuff, Dimitre, please lob it over to www-xpath-comments.
> -----Original Message-----
> From: owner-xsl-list@lists.mulberrytech.com
> [mailto:owner-xsl-list@lists.mulberrytech.com]On Behalf Of Dimitre
> Novatchev
> Sent: 16 January 2002 10:07
> To: xsl-list@lists.mulberrytech.com
> Subject: [xsl] Higher-Order Functions in XPath 2.0
>
>
> Hi,
>
> Bellow is a draft proposal for implementing higher-order functions in
> XPath 2.0.
>
> I'd greatly appreciate receiving any comments about the
> correctness and
> validity of this text, as well as for possible improvements.
>
> Cheers,
> Dimitre Novatchev.
> -------------------------------------------------------------------
>
> Higher-Order Functions in XPath 2.0
>
> Contents
> --------
> Part I. Problems with XPath 2.0
>
> 1. Using the "for" expression for incremental processing
> 2. Difficulties in returning aggregate (nodes) value
> 3. Combining two sequences in producing a sequence
> 4. Text processing beyond the limits of regular expressions
> 5. XPath language complexity has grown considerably
> 6. Inflexibility, where equality and comparison-returning
> functions are needed
> 7. Little or no reusability is possible for "for" expressions
>
> Part II. Higher-Order Functions Solutions
>
> III. Conclusion
>
> IV. Recommendation
>
>
>
>
> Part I. Problems with XPath 2.0
> -------------------------------
>
> A general problem with XPath 2.0, as specified by the current working
> draft, is that some tasks exist, which cannot be solved within the
> language, and another group of tasks can be accomplished in XPath 2.0,
> but in a rather inefficient way. In all of these cases programmers can
> solve the task by using recursive XSLT templates and this is
> a powerful
> method, but at the same writing and testing different recursive
> templates for every particular case is a time-consuming and
> error-prone
> process and often the result has low reusability.
>
> Listed bellow are some examples of such problems:
>
> 1. Using the "for" expression for incremental processing
> The "for" expression in XPath 2.0 cannot be used when the results
> of processing every item of a sequence depend on the processing of the
> previous items. Or in some cases it can be used, but in an obviously
> inefficient way:
>
> - Find the product of all numbers in a sequence.
> - Reverse a sequence
> - Concat all string items of a sequence.
> - Reverse a string or a sequence of items.
>
> A typical use-case, for which the XPath 2.0 solution is difficult and
> inefficient, is the following:
>
> Given a sequence of "book" nodes, calculate and display the amount of
> money received from the sales of each book (price * sales), but also
> obtain and display a running total, as each book node from
> the sequence
> is processed. To achieve this in XPath, one would write:
>
> for $i in (1 to count($items))
> return ($items[$i],
> sum( for $j in (sublist($items, 1, $i))
> return (@price * @sales) )
>
> In the above expression, if N is the number of items in $items, sum()
> is performed N * (N + 1) / 2 times.
>
> While the above may seem to be just a textbook example (and really can
> be found in Mike Kays book), there are real-world examples, where a
> running total must be calculated and even several results must be
> accumulated in parallel. I am deeply obliged to Mark Nahabedian
> (naha@ai.mit.edu ) for allowing me to quote his work which has to deal
> with exactly this problem -- a complete example can be found at:
>
> http://www.ai.mit.edu/people/naha/itrack/about.html
>
>
> 2. Difficulties in returning aggregate (nodes) value from a sequence,
> especially when returning those nodes depends in a non-trivial way on
> the other nodes of the sequence:
>
> - Obtain all nodes with "maximum value" from a sequence, especially
> in the case when the node "value" is computed by a very complex
> expression.
> - Obtain the nodes with "distinct values" from a sequence,
> especially in the case when the node "value" is computed by a very
> complex way.
> - There's no general way to "filter" elements of a
> sequence based on
> a predicate.
>
> The reason is that a predicate (function) cannot be passed as a
> parameter to a general "filter" function. As a result programmers will
> write multitude of similar filtering expressions, without
> being able to
> re-use them. Such repetitions are time-consuming, error-prone, and
> generally result in un-maintainable and non-reusable code.
>
> Examples of problems in this group:
>
> - Return the sum of squares of the numbers in a sequence.
> - Return all items in a sequence, for which f(item) has minimal
> value
> - For some function f() test whether all the values of f(item) on a
> sequence are equal (> 0, etc.)
> - For some function f() test whether all values f(item) on a
> sequence are in increasing order.
>
> Although a solution can be found in XPath, it will be difficult and
> inefficient.
>
> Also, for every different function f() another version of the same
> solution will have to be produced, because functions cannot be passed
> as parameters.
>
> 3. Combining two sequences in producing a sequence:
> Given (a1, a2, ..., aN) and (b1, b2, ..., bN) compute:
>
> (a1 + b1, a2 + b2, ..., aN + bN)
>
> (a1 * b1, a2 * b2, ..., aN * bN)
>
> (a1 and b1, a2 and b2, ..., aN and bN)
>
> (a1 or b1, a2 or b2, ..., aN or bN)
> etc.
>
>
> 4. Text processing beyond the limits of regular expressions is not
> possible.
>
> A real world problem was pointed out by David Carlisle -- he needs in
> his work to match strings, surrounded by (unknown in advance number
> of) balanced parenthesis.
>
> For any such problem, it would be nice to have a general, table-driven
> parser() function. However this is not possible, because the parser()
> function will need to be passed as parameter a lex() function that it
> must call for obtaining the terminal symbols from the input text.
>
> 5. XPath language complexity has grown considerably and the language
> cannot continue to expand indefinitely:
>
> Already there are hardly any spare characters left for operators.
> Often there are (more than one) different ways of performing similar
> tasks.
>
> In contrast, a language that supports higher-order functions can be
> kept simple, small and elegant, while at the same time providing
> powerful means to produce any necessary new functionality.
>
> Thus the "standard" language features (e.g. operators and functions)
> can be kept to a minimum, while the language makes possible
> desired new
> functionality to be easily produced and accumulated into a library of
> general and reusable useful functions.
>
> Without support for higher-order functions such libraries are very
> limited in scope
> and usefulness.
>
> 6. Inflexibility, where equality and comparison-returning
> functions are
> needed to be passed to:
> - sort
> - distinct-values
> - grouping, etc.
>
> 7. Little or no reusability is possible for "for" expressions
>
> In the expression bellow:
>
> for $i in (1 to count($items))
> return expression
>
> expression cannot be reused (simply copied and pasted) and
> will have to
> be modified every time it is used with differently named
> range variable
> so that references to the range variable are renamed.
>
> In contrast, with higher-order functions support one can have a map()
> function, so that in
>
> map(f, $sequence)
>
> the code of f() will never have to be modified.
>
>
> Part II. Higher-Order Functions Solutions
> -----------------------------------------
>
> Provided higher-order functions were available, the problems listed
> above have easy and natural solutions.
>
> To demonstrate the compactness and high degree of
> readability, the code
> bellow is written in Haskell. Haskell is used only for convenience, in
> no way should it be inferred that the same syntax is recommended for
> XPath 2.0. Some basic conventions from this language:
>
> f x y = x * y
>
> This defines a function f(x,y) = x * y
>
> [1, 2, 3]
>
> This is a list of elements 1, 2, and 3. The same (for our purposes) as
> (1, 2, 3) in XPath 2.0.
>
> []
>
> This denotes the empty list -- the same as () in XPath 2.0.
>
> x -- denotes the name of a single element/function.
>
> xs -- any name ending in 's' denotes a list of elements.
>
> Any operator can be used also as a function, when put in brackets.
> Thus:
>
> (+) 1 2 = 1 + 2 = 3
>
> The (:) operator is used to prepend an element to the start of a list:
>
> x : xs
>
> defines a list with head x and tail xs.
>
> The flip() function takes as argument any function with two arguments,
> and produces as result a the same function, which takes these two
> arguments in reverse order.
>
> flip f x y = f y x
>
> Primitive recursion over a list can be defined as follows:
>
> foldl f z [] = z
> foldl f z (x:xs) = foldl f (f z x) xs
>
> The function "foldl" takes 3 arguments -- a function f(),
> which takes 2
> arguments, a value z, and a list.
>
> This is one of the most general functions over lists. It traverses the
> list from left to write, applying f() on each element and the
> currently
> accumulated result.
>
> There is a dual function (foldr), which behaves in a similar way, but
> traverses the list from right to left:
>
> foldr f z [] = z
> foldr f z (x:xs) = f x (foldr f z xs)
>
>
> As can be easily seen:
>
> foldl (+) 0 xs
>
> is the sum of all elements in a list xs. Therefore we could write:
>
> sum xs = foldl (+) 0 xs
>
> Analogously:
>
> product xs = foldl (*) 1 xs
>
> And this one liner is the solution to one of the problems in Part I,
> section 1.
>
> We can ommit the last operand(s) from an equation, in case it is the
> same and we still get a valid function definition. Therefore,
> the above
> function definitions could be simplified even further and written as:
>
> sum = foldl (+) 0
>
> product = foldl (*) 1
>
>
> Reversing the elements of a list (solution of another problem in Part
> I, section 1.) is simply defined as:
>
> reverse = foldl (flip (:)) []
>
>
> Concatenating all elements of a list (solution of the next problem) is
> simply:
>
> concat = foldr (++) []
>
> where (++) is the concatenation operator for lists.
>
>
> Combining two lists with equal length into one can be performed using
> the zipWith() function:
>
> zipWith f (a:as) (b:bs) = f a b : zipWith f as bs
> zipWith _ _ _ = []
>
> The function f() is applied on every pair of elements at position N
> from the two lists, and the result forms the element at position N in
> the result list.
>
> ZipWith() solves directly all the problems from Part I, section 3:
>
> - (a1 + b1, a2 + b2, ..., aN + bN) is just:
>
> zipWith (+) as bs
>
> - (a1 * b1, a2 * b2, ..., aN * bN) is just:
>
> zipWith (*) as bs
>
> - (a1 and b1, a2 and b2, ..., aN and bN) is just:
>
> zipWith and as bs
>
> - (a1 or b1, a2 or b2, ..., aN or bN) is just:
>
> zipWith or as bs
>
>
>
> A very useful function is scanl:
>
> scanl f q xs = q : (case xs of
> [] -> []
> x:xs -> scanl f (f q x) xs)
>
> It is similar to foldl, but creates a list, every element of which
> contains the intermediate accumulated result. The first element of the
> result-list is q.
>
> In case the list is guaranteed to be non-empty, then the following
> function can be defined:
>
> scanl1 f (x:xs) = scanl f x xs
>
> It behaves like scanl(), but doesnt use a zero argument.
>
> As can be easily seen,
>
> scanl1 `op` xs
>
> produces a list of the intermediate accumulated results of performing
> op() on the list xs. For example:
>
> scanl1 (+) [1, 2, 3] = [1, 3, 6] = [1, 1+2, 3 + 3]
>
>
> scanl1() combined with zipWith() solve directly the problem of
> calculating the running total from Part I, section 1:
>
> scanl1 (+) (zipWith (*) [1,2,3] [2,2,2])
>
> returns:
> [2, 6, 12]
>
> A direct solution to the filtering problems defined in Part I, section
> 2, is provided by the function filter():
>
> filter p xs = [ x | x <- xs, p x ]
>
> it takes a function p() defined on the type of elements of its second
> argument - a list xs, and returns a list of those elements of xs, for
> which p(x) = true.
>
> Using it, we can write:
>
> - Return all items in a sequence, for which f(item) has minimal value
>
> minvals f xs = filter (= fmin) ys
>
> where fmin = minimum ys,
> ys = map f xs
>
>
> - For some function f() test whether all the values of f(item) on a
> sequence are equal (> 0, etc.)
>
> allFPositive f xs = foldl and ys
>
> where ys = map f xs
>
> - For some function f() test whether all values f(item) on a sequence
> are in increasing order.
>
> allFIncreasing f xs = foldl and ys
>
> where ys = zipWith (<) (init xs) (tail xs)
>
>
> In the last solution we used the init() function, which is the dual of
> tail() - from a list xs it produces another, containing all
> elements of
> the first list , but the last one:
>
> init (x:xs) = x : init xs
>
>
>
> Finally, heres an example how to keep a language small and simple:
>
> Whenever a programmer needs a mapping operator, she could produce it
> immediately herself, without having to ask a working group for
> including it in the standard language, as follows:
>
> 1. She defines the function map():
>
> map f xs = foldl ( (:) . f ) [ ] xs
>
> 2. Because she needs to apply the mapping operator repeatedly, for
> convenience she defines a multiMap() function:
>
> multiMap xs fs = foldr map xs fs
>
> multiMap [1, 2, 3] [(1+), (2*), (5+)]
>
> produces:
>
> [13,15,17] , that is [(1 + 5)*2 + 1, (2 + 5)*2 + 1, (3 +
> 5)*2 + 1]
>
> 3. The multiMap() function as defined above applies the functions
> starting from the last one in the list. The programmer wants them
> applied starting from the first function in the list. She re-defines
> multiMap by changing foldr to foldl as follows:
>
> multiMap xs fs = foldl (flip map) xs fs
>
>
> Now
> multiMap [1, 2, 3] [(1+), (2*), (5+)]
>
> produces:
>
> [9, 11, 13]
>
> that is: [(1 + 1)*2 + 5, (2 + 1)*2 + 5, (3 + 1)*2 + 5]
>
> 4. She is ready to use the new function. For example, she specifies a
> series of SVG coordinate transformations:
>
> multiMap ($coordinates,
> ((. *2),
> (if (position() mod 2) then . + 50 else .)
> ..
> )
> )
>
>
>
> III. Conclusion
> ---------------
>
> XPath 2.0 as specified in the current working draft has the
> problems described in Part I. A language with support for higher-order
> function is free of these problems.
>
> IV. Recommendation
> ------------------
>
> Based on the above conclusion, I recommend that higher-order
> functions support be implemented in Xpath 2.0.
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Send FREE video emails in Yahoo! Mail!
> http://promo.yahoo.com/videomail/
>
> XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
>
>
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list