This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
What is XSLT? ( very long )
- To: xsl-list at mulberrytech dot com
- Subject: What is XSLT? ( very long )
- From: Paul Tchistopolskii <paul at qub dot com>
- Date: Tue, 08 Aug 2000 00:16:36 -0700
- Cc: Sebastian Rahtz <sebastian dot rahtz at computing-services dot oxford dot ac dot uk>
- Organization: The Qub Group
- Reply-To: xsl-list at mulberrytech dot com
<DISCLAIMER>
I apologize for the style of this letter. This
letter should be considered to be a loose translation
of some short story called "The weekend of YAXSLT hacker".
I apologize for possible offtopic and I'l not
longer write such a letters into this list, but
will try to keep the 'official' style and spirit.
However, because I think that this letter is
almost about XSLT, it maybe be not really
wrong to post it here.
</DISCLAIMER>
<DISCLAIMER1>
The 'quotes' are not the actual quotes, but how some
words have been reflected in my head. For example, of
course Sebastian was never saying some words - this
is my 'internal' interpretation of his words.
This is kind of bad literature.
</DISCLAIMER1>
1. Sebastian says : "XT sucks - it has no key()".
I reflect this in this way because the rest of XSLT
is almost implemented by XT, things like encodings
is not a big deal and I know Sebastian knows it.
2. I really don't use key() and never had to. I remember
James Clark saying ( again - it is my reflection )
"yeah, people are really using key() to my surprise".
I smell something is suspicious here.
I *have* to understand what happens with key().
I have started the battle, asking Sebastian to provide
a usecase.
3. Usecase is here. Yeah.... I hardly can read it ....
What it really does? OK, from the output I understand
what it does.
4. Looks that I could split it into 2-steps.
This helped me when I was rendering reports.
( See below. This is actually important. ).
5. Splitted. It will be nice if first step will render
it into
<flat1>
<year>1234</year>
<fnm>name </fnm>
<snm>name </snm>
<fnm>name </fnm>
<snm>name </snm>
<year>456</year>
...
</flat1>
just a bunch of 'lines to draw'. This means step1 should not
generate the 'same' year more than once. What??? Why this
'preceding' thing takes so long? OK - let me 'filter' out
the 'same year' on step 2. Works. Now I have to enclose the
list into <ol> </ol> WHAT ??? Hm. I can not generate 'start-tag'
and 'end-tag'. I know that - just forgot ... I actually know
that this thing should not be allowed, even I saw some tools
already allowing that, and I myself was some day thinking
that 'this is handy to generate 'start-tag' and 'end-tag'.
But today I'm sure - XSLT is right. (a) I never hit the wall
because of this restriction (b) this restriction forces
better code - this means it is good restriction. If I'l
encounter the situation when (a) or (b) will not be true -
I'l of course change my mind. It is the only way to judge
the programming language, I think.
Whatever - I have been in such situation before, so
I know how it should be worked around. I should put
those <ol> </ol> and then in the body I have to dump
out the list ... of 'what' How can I understand
how many persons belong to the 'year' ?
I'm not passing the info about the year 'attached'
to the person. I should. I should pass the <person>
and the <year> OK. Works.
6. Sebastian says 'you can do it in one stylesheet, in 2
stylehseets, or with key'. TIMTOWTDI. Right, right. But
some of the ways are ugly. Also, when I tried to put it into
one stylehseet *not*piped* - it was darn slow. But wait -
I can place those 2 transformations *piped* into one
stylehseet with no problem. I should tell it to
Sebastian later. ( See below ).
7. Wait ... what I *really* did? Why it was possible for me
to go from 'flat' to 'hierarhy' ? Aha - because in the flat
file which I got after step1 each record has a 'year' - so
it was easy to 'take part out of the flat list' and
'call yourself with the rest'. At this point I *should*
already see the truth, but as any human being I was
stupid enough to bypass it. ( The truth comes later ).
8. OK. This was easy because the 'flat' list
on step2 was 'of 'simple' structure'. What if
I make it *complex* structure so that it will be
not easy to process it like I did? Here is
'Flat puzlle'. Damn. I dont see the easy way to
get from this 'flat' to hierarchy ! I'm tired and
have to sleep.
9. Steve says : "we have tons of usecases". Great!
I ask him - he should already got such a 'flat' thing
from somebody - looks very typical. Also I have some
strange feeling ... I smell the odor of the 'key()'
function here. The odor is so strong ... But why ???
I come to this 'Flat puzzle' just 'making ' flat file
more complex' ... Verrry strange ....
10. Dang. Steve responded and there is of course key().
I should do it without key().
11. Got a sleep. Looked at this once again. Gee - soo easy!
I *already* know how the 'flat' file *should* look
to make it 'hierarchical. ( test6 part 2 ). Why I just
don't convert this 'flat puzzle' file to *that* flat file!
Done. Works.
WAIT!!!
12. What is the structure of my 'flat file which
is easy to convert to hierarchy ? It is the
ordered list of records where each record has
a *KEY*. O HOW STUPID I AM!!!! All I'm doing
I'm just serializing the HASHTABLE.
13. I implemented the key() functionality in
XSLT itself. And of course it is slower than
bult-in HASHTABLE AKA key(). ( The funny thing is
that my 'implementation' is not *that* slower,
but I already know that there are hashtables of
hashtables behind the scenes. Yeah - James Clark
implemented his own Hahstable for XT - not using
the java Hashtable. Sure he is already using
hashing techniques here and there. If not -
my 'hand-made hashtable' should be *darn* slower
than key().
14. How could *my* hashtable be improved?
OK, for example - this typical 'count()'
thing in step2 is always doing some useless
things. *I* know that the list is sorted.
But this stupid 'count()' does not know this!
How can I tell to the count() that it should
'stop after 'key changed but
not to look over the entire list again
and again?
I CAN DO THAT WITH THE ROADSIGN -
but this means 'key()' again and again ....
And it is not readable construction,
that key()... and it sometimes will not
improve the things... and some
optimizations may require another
'logic'...
15. There should be no 'key()' function.
There should be only <xsl:key element bulding
the tricky indexes and when engine encounters
some expression it should use those
indexes *if they are applicable for this
expression* ( could be signaled by new
syntax of <xsl:key,) This is very hard
task , but the idea is like PRIMARY KEY in
the SQL. Like 'precompiled' option of regular
expression in perl.
This means <xsl:key will become what it has to be -
plain roadsign, forcing building of particular
( maybe more complex than it is now ) indexes
to speedup some particular 'regular expressions'.
And it should be called <xsl:index, of course.
This is the way to go and I think XSLT
has missed it, masquerading the real problem
with that 'key()' hack, like they did
with document of 2 parameters ( masquerading
RTF / node-set conversion ).
16. Should I use current key()?
Why key() is not readable?
key() is not readable because
in fact even there is *one* regular
expression, the *parts* of this
'regular expression' are placed
in different places of the stylesheet!
( not the case with my view on <xsl:index )
In current key() some parts are located in
<xsl:key and some parts are in key().
This is the only place in XSLT when to see
what happens you should jump from one
place to another, composing the actual
expression which will be used, but you can
see this 'only in your head'.
17. OK, maybe I'm cheating myself here. I'l
test it next time if something will smell like key().
Thanks to Steve - I now understand how key()
could be used ( look at the pipe and it will show
what could become key ;-), so if I'l fail in any
trouble with my way - I can always test key() way.
Maybe I'm still missing something ... ( I should
be honest. I don't think I'm missing something, but
there is always a chance. )
18. Well.... Now I should write to Sebastian that
pipe or 'one stylesheet' is a mythical distinction.
If I have 'a | b' I can always write a1.xsl with the
structure of
<xsl:variable name="step1"/>
<doc>
tranbsformation 1
</doc>
</xsl:variable>
<xsl:apply-templates select="xt:node-set($step1)" mode = "tranformation2"/>
<xsl:template match="/doc" mode="transformation2" >
...
19. Well - in perl I can do the same. Create some hash,
pass it down the road... What is the *difference*.
The difference is that in perl ( and any other language )
I have access 'by value' and 'by pointer' ( or by reference,
but let me call it 'by pointer' ).
In XSLT I have only 'access by value'. Java tried hard
to kill the distinction between those 'access by pointer'
and 'access by value'. They ended with 'mostly access by
pointer, but not really'.
The way Java did it results in the situation that when you
see foo( bar, baz ) you can not tell will the bar or baz
be modified in the code, because those bar and baz could
be passed down the road e t.c. ( With other languages
you at least can *guess* that if it is 'by pointer' it is
'to be modified'. Well - very hypothetical, because it is
also not true ;-) It at least gives some chance. Or you can
start utilizing the special notation - but this all is
very weak, I think. Anything based on 'notation' is weak,
a last chance try ....
Hmmm... XSLT looks very strong here because ...
WITH XSLT WE HAVE NO SUCH PROBLEMS. XSLT's
'weakness' ( lack of updateable variables ) is actually
*very* strong feature of XSLT. I was stupid not understanding
this for a *very* long time, but nobody told me the right
thing!
They were talking about 'side-effects', 'declarative
languages' e t.c.
But the point is that XSLT is the language which
has only ACCESS BY VALUE, but no 'access by pointer',
because updating the variable is just a simple case of
ACCESS BY POINTER.
Well, maybe there were somebody saying it in this words,
but I can not remember this. The Bible is messy on this
topic, explaining some mythical usecases ( even the Bible
is in fact trying to say about NO ACCESS BY POINTER
( or by reference ) - just using another words... )
20. Why it is good *not* to have access by pointer then?
We are used to access by pointer and was Niklaus Wirth
an idiot? ( The answer is - of course he was *not*)
Why we are used to access by pointer?
I think it is because of this 'efficiency' thing. It is a 'roadsign'
for 'more efficient' ( 'memory-saving' ) internal dataflows.
Consider the hypothetical situation when you are passing
the *entire* context of *entire* program to *every* function
in some special way. 'fast-searchable global variable',
but not many 'prepared' variables. 'database instead
of variables'. Like in SQL, for example - rows have no names.
;-)
Do you need the access by pointer, if each of your functions
has 'efficient way to 'search' for the knowledge'
which otherwise 'was accumulated in the appropriate variable' ?
The answer is : it looks that it is really possible for most
of cases to 're-do' some things constantly.
Is it *less* efficient than accumulating knowledge in the
variables? Yes. For 'simple architecture' ( no parallelism )
XSLT ( access by value only ) is *by design* less efficient
than any language which allows access-by-pointer.
What the hell? Why should we use it if it is less efficient?
Because of the same reason we can live without key().
A bit ( even twice as ) non-efficient, but 'clean'.
XSLT semantics allows writing the cleanest possible code
( below there will be one more critical feature of XSLT
which allows that. )
This is really funny. XSLT says : "use key() roadsign
to hack for speed" and on another hand the same XSLT says :
"*don't* use 'access-by-pointer' to hack for speed."
Could you please explain it again?
Well...
At the moment I can show some things by example ( as I said,
Bible is actually talking about the similar stuff, it is just using
suspicious usecases - there should be better ones, but this
requires somebody with nice hardware background. How good
will be parallelism for XSLT is questionable - this is another
long topic. I'm talking about the software part AKA clean code
and 'current' hardware architecture. )
21. The example is again the 'flat' -> Hierarchical 'converter' step2.
count() is the 'fast recalculator'. "searcher of information" instead
of 'storer of information' ( updateable variable ).
In the language with updateable variables I could ( on step 1 )
iterate over the list and then store the number of members,
and then pass this information down the road. Because I can not
do that, I'm passing the 'entire content down the road' and
'when I need that number of members - I'm recalculating it'.
There will always be overhead of those recalculations in XSLT,
the question is 'is this overhead worth the clean code you get' ?
My answer is "yes". I'm rejecting the key() for the sake of clear
code, so of course I'm rejecting the evil of 'access by pointer',
no matter that 'access-by-pointer' gives me yet another
ability for manual tuning of efficiency. I don't need
malloc(), I'm OK with garbage collector. This is all about the same.
I'm betting on XSLT with clear understanding that XSLT will be
slower than any 'ordinary' language. I know that XSLT
will be *more clean* in return. And also I know that
XSLT is for pipes and pipes are for XSLT.
22. Why pipes are for XSLT ?
22.1 What are those 'pipes' ?
First - please do not forget that pipes have almost *nothing*
to do with the number of styleheets. Having it in multiple
stylesheets is just 'a bit cleaner and easier' ( For example,
to dump the intermediate dataflow I can just redirect it to the
file , ( not inserting <xsl:copy-of select ), a bit easier
is to make per-node validation. Small thing here - small thing
there. Not a rocket science.
'pipe' is first of all a logical entity. 'Thinking pipes'
is very important UNIX skill which is rare in current world
of people with no math education.
To love pipes you should love math. Writing pipes
component - after - component is like first
making lemma, then theorem then another ... and then
re-use a theorem. Soo cool. But this is also hard, yes.
Not anybody gets the beauty of math. Well - people
are all different. I for example don't get the beauty of
chemistry, and I remember some developers who were
very good in chemistry in the university - they had
very special view on programming from my point of view ;-)
Actually some of them were very good developers, just
'different'. Education has a huge impact, actually.
Those who have no fun from math have no fun from pipes and
usually simply have no skills ( even they think they do have).
Collecting complex statistics from log files is a nice
task to get those skills. Not any UNIX activity helps
you to 'get it'. This is the reality, sorry if this
sounds strange to some of you with, say, VMS-only
background. ( Not bad thing to remember that ugly
unreliable UNIX have crushed reliable and accurate
VMS. Because of pipes. )
Math is hard. Pipes are also hard. Not many people
get math. Not many people get pipes. Pipes is a
concept and it is not obvious concept.
22.2. OK, stop it, this is all simple - we know this -
show how 'thinking pipes' works with XSLT.
Hm. First - have you mentioned the test6 and
'Flat puzzle' ?
But pipes can do more than just 'serializing hashtable'
and then 're-using the theorem'.
Consider some statistical report. Let's say "Batches
of checks processed on some box".
You get the number of complex records of unknown height
and you want to print a footer at the end of each page.
This is all of course plain ASCII. So in 'normal'
language when outputting each new line to the printer,
I'm just
Nlines++;
if ( NLines == PAGE_WIDTH ) { print_footer() }
Not that simple with XSLT ;-)
I think this actually looks darn hard and almost
unsolvable 'clearly', if not 'thinking pipes'.
key() will not help here ;-)
However, the solution is very much similar to those
used before. Just assume that there is another
stylesheet 'down the road' out there and print
the 'flat list'
<line> content </line>
<line> content </line>
....
And then in the second stylesheet 'group' the number
of lines into 'page' exactly like test6 step 2 works -
just use select="list[ position() gt; $PAGE_WIDTH ]"
See - the same grouping component again. And no 'keys'
to worry about at all. Maybe this grouping component
is a generic thing? Maybe there are more ? Yes and yes.
In the presence of 'second stylesheet' you may realize
that you need some better balancing between first
stylesheet and second e t.c. e t.c. This will result
in 'clean' dataflow between the nodes. I mean only things
which should be passed will be passed ( remember -
there is no 'access by pointer', no evil pointers are
passed down the road ;-)
Mind to compare this to what happens inside the typical
intranet written in Java / perl / C++ / whatever ?
XSLT really helps. 'Thinking pipes' really helps.
It is good for your code, it is good for your data,
it is good, because XSLT is 'closer' to UNIX pipes
than any other language. Why?
Because of no 'access by pointer', XSLT is
'pushing more and more content down the road not
looking back'. Pardon - but this is the way how
UNIX pipe works! Both XSLT and UNIX pipes
are 'looking only ahead but not look back'.
They are good for each other.
22.3. Why XSLT syntax is good for pipes.
In fact every time you write <xsl:call-template
non-recursive and with 'simple parameters'
you should think twice.
WHAT?????
Yes, you should.
In the presence of the second stylesheet (transformation)
you can always write
<xsl:call-template name="foo" ... with param="bar>
In the form of
<FOO attr="bar"/>
And then provide the <xsl:template match="FOO"
into second stylesheet. And look - what could be
better to read? Worth thinking every time,
actually.
23. Heck - I can do the same with perl. I can
pass only hashes by value e t.c. What is special
about XSLT ?
... Syntax ... You can not do 22.3, for example,
easily switching from 'this is data - this is code'
Yes, I know, I know - you think you *can* with
Text::Template and things like, say, XPathScript.
The truth is that you can not. Small thing here,
small thing there. XSLT is darn good. The problem
was XSMLish notation,but XSLTScript notation
always solved this problem for me. Yes - no else
and no auto-recursion. This is not the big deal
with XSLScript or other preprocessor. The core
of XSLT as a 'templatish dataflow language
suited for piped transformations by design'
It is very healthy.
24. Forget all the crap about those
Access-by-pointer tricks. Think pipes and
dataflows. XSLT appears to be the first
language which forces clean dataflows -
even it appears they were not understanding
what they really invented. This happens
very often.
25. The challenge is still open. I'l be glad
to see something which really needs the
updateable variable. For a while I was
thinking that 22.2 is the case, but I think
now it is clear that it is not the case
and the
Nlines++;
if ( NLines == PAGE_WIDTH ) { print_footer() }
Is a hack, but piped XSLT view is *better*.
25. There should be some problems!!!
Yes:
1. Extensions. If not 'thinking pipes'.
If 'thinking pipes' - could be possible to
cooperate even with 'event driven GUI'.
Check Plan 9 for how to write GUI with
awk and do the same. Research required.
2. Speed. If not providing some smart
way to 'prebuild' some indexes - some
parts could be darn slow or become
horrible mess of key() statements.
It could be partially solved with
XSLScript introducing some meta-construction(s)
for autogeneration bunch of 'key()' -
but this is a mess. It is better to
have it in a core. But this is questionable.
Because nobody was thinking about
generation of key() out of Xpath expressions,
I doubt that current design or even syntax
of key() will survive 'the good desgin'.
Research required.
Rgds.Paul.
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list