This is the mail archive of the xsl-list@mulberrytech.com mailing list .
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
What is XSLT? ( very long )

To: xsl-list at mulberrytech dot com
Subject: What is XSLT? ( very long )
From: Paul Tchistopolskii <paul at qub dot com>
Date: Tue, 08 Aug 2000 00:16:36 -0700
Cc: Sebastian Rahtz <sebastian dot rahtz at computing-services dot oxford dot ac dot uk>
Organization: The Qub Group
Reply-To: xsl-list at mulberrytech dot com



 <DISCLAIMER>

 I apologize for the style of this letter. This 
 letter should be considered to be a loose translation 
 of some short story called "The weekend of YAXSLT hacker". 
 I apologize for possible offtopic and I'l  not 
 longer write such a letters into this list, but 
 will try to keep the 'official' style and spirit. 

 However, because I think that this letter is 
 almost about XSLT, it maybe be not really 
 wrong  to post it here. 

 </DISCLAIMER>

 <DISCLAIMER1>

 The 'quotes' are not the actual quotes, but how some 
 words have been reflected in my head. For example, of 
 course Sebastian was never saying some words - this 
 is my 'internal' interpretation of his words. 
 This is kind of bad literature.

 </DISCLAIMER1>

 1. Sebastian says : "XT sucks - it has no key()". 
 I reflect this in this way because the rest of XSLT 
 is almost implemented by XT, things like encodings 
 is not a big deal and I know Sebastian knows it.

 2. I really don't use key() and never had to. I remember 
 James Clark saying ( again - it is my reflection ) 
 "yeah, people are really using key() to my surprise".  
 I smell something is suspicious here.

 I  *have* to understand what happens with key(). 
 I have started the battle, asking Sebastian to provide 
 a usecase.

 3. Usecase is here. Yeah.... I hardly can read it ....
 What it really does?  OK, from the output  I understand 
 what it does. 

 4. Looks that I could split it into  2-steps. 
 This helped me when I was rendering reports. 
 ( See below. This is actually important. ).
 
 5. Splitted. It will be nice if first step  will render 
 it into  

<flat1>
 <year>1234</year>
 <fnm>name </fnm>
 <snm>name </snm>
 <fnm>name </fnm>
 <snm>name </snm>
 <year>456</year>
 ...

</flat1>

 just a bunch of 'lines to draw'.  This means step1 should not 
 generate the 'same' year more than once. What??? Why this 
 'preceding' thing takes so long? OK - let me 'filter' out 
 the 'same year' on step 2. Works. Now I have to enclose the 
 list into <ol> </ol> WHAT ??? Hm. I can not generate 'start-tag' 
 and 'end-tag'. I know that - just forgot ... I actually  know 
 that this thing should not be allowed, even I saw some tools
 already allowing that, and I myself was some day thinking  
 that 'this is handy to generate 'start-tag' and 'end-tag'. 
 But today I'm sure - XSLT is right.  (a) I never hit the wall 
 because of this restriction (b) this restriction forces 
 better code - this means it is good restriction. If I'l 
 encounter the situation when (a) or (b) will not be true - 
 I'l of course change my mind. It is the only way to judge 
 the programming language, I think.

 Whatever - I have been in such situation before, so 
 I know how it should be worked around. I should put 
 those <ol> </ol> and then in the body I have to dump 
 out the list ... of 'what'  How can I understand 
 how many persons belong to the 'year' ? 
 I'm not passing the info about the year 'attached' 
 to the person. I should. I should pass the <person> 
 and the <year> OK. Works.

 6. Sebastian says 'you can do it in one stylesheet, in 2 
 stylehseets, or with key'. TIMTOWTDI. Right, right. But 
 some of the ways are ugly. Also, when I tried to put it into 
 one stylehseet *not*piped* - it was darn slow. But wait - 
 I can place those 2 transformations *piped* into one 
 stylehseet with no problem. I should tell it to 
 Sebastian later. ( See below ).

 7. Wait ... what I *really* did? Why it was possible for me 
 to go from 'flat' to  'hierarhy' ? Aha - because in the flat 
 file which I got after step1 each record has a 'year' - so 
 it was easy to 'take part out of the flat list' and 
 'call yourself with the rest'. At this point I *should*
 already see the truth, but as any human being I was 
 stupid enough to bypass it. ( The truth comes later ).

 8. OK. This was easy because the 'flat' list 
 on step2 was 'of 'simple' structure'. What if 
 I make it *complex* structure so that it will be 
 not easy to process it like I did? Here is 
 'Flat puzlle'.  Damn. I dont see the easy way to 
 get from  this 'flat' to hierarchy ! I'm tired and 
 have to sleep.

 9. Steve says : "we have tons of usecases". Great!
 I ask him - he should already got such a 'flat' thing 
 from somebody - looks very typical. Also I have some 
 strange feeling ... I smell the odor of the 'key()' 
 function here. The odor is so strong ... But why ??? 
 I come to this 'Flat puzzle' just 'making ' flat file 
 more complex' ... Verrry strange ....

 10. Dang. Steve responded and there is of course key().  
  I should do it without key().

 11. Got a sleep. Looked at this once again. Gee - soo easy!
 I *already* know how the 'flat' file  *should* look 
 to make it 'hierarchical. ( test6 part 2 ). Why I just 
 don't convert this 'flat puzzle' file to *that* flat file!
 Done. Works. 

 WAIT!!!

 12. What is the structure of my 'flat file which 
 is easy to convert to hierarchy ? It is the 
 ordered list of records where each record has 
 a *KEY*. O HOW STUPID I AM!!!! All I'm doing 
 I'm just serializing the HASHTABLE.

 13. I implemented the key() functionality in 
 XSLT itself. And of course it is slower than 
 bult-in HASHTABLE AKA key(). ( The funny thing is 
 that my 'implementation' is not *that* slower, 
 but I already know that there are hashtables of 
 hashtables behind the scenes. Yeah - James Clark 
 implemented  his own Hahstable for XT - not using 
 the java Hashtable. Sure he is already using 
 hashing techniques here and there. If not - 
 my 'hand-made hashtable' should be *darn* slower 
 than key(). 
 
 14. How could *my* hashtable be improved? 
 OK, for example - this typical 'count()' 
 thing in step2 is always doing some useless 
 things. *I*  know  that the list is sorted. 
 But this stupid 'count()' does not know this! 
 How can I tell to the count() that it should 
 'stop after 'key changed but 
 not to look over the entire list again 
 and again? 

 I CAN DO THAT WITH THE ROADSIGN - 
but this means 'key()' again and again ....
And it is not readable construction, 
that key()... and it sometimes will not 
improve the things... and some 
 optimizations may require another 
 'logic'...

 15. There should be no 'key()' function.
 There should be only <xsl:key element bulding 
 the tricky indexes and when engine encounters 
 some expression it should use those
 indexes *if they are applicable for this 
 expression* ( could be signaled by new 
 syntax of <xsl:key,) This is very hard 
 task , but the idea is like PRIMARY KEY in 
 the SQL. Like 'precompiled' option of regular 
 expression in perl. 

 This means <xsl:key will become what it has to be - 
 plain roadsign, forcing building of particular 
 ( maybe more complex than it is now ) indexes
 to speedup some particular 'regular expressions'.

 And it should be called <xsl:index, of course.

 This is the way  to go and I think XSLT 
 has missed it, masquerading the real problem 
 with  that 'key()' hack, like they did 
 with document of 2 parameters ( masquerading 
 RTF / node-set conversion ). 
 
 16. Should I use current key()? 

 Why key() is not readable?

 key() is not readable because 
 in fact even there is *one* regular 
 expression, the *parts* of this 
 'regular expression' are  placed 
 in different  places of the stylesheet!
 ( not the case with my view on <xsl:index )

 In current key() some parts are located in 
 <xsl:key and some parts are in key(). 
 This is the only place in XSLT when to see 
 what happens you should jump from one 
 place to another, composing the actual 
 expression which will be used, but you can 
 see this 'only in your head'. 
 
 17. OK, maybe I'm cheating myself here. I'l 
 test it next time if something will smell like key().
 Thanks to Steve - I now understand how key()
 could be used ( look at the pipe and it will show 
 what could become key ;-), so if I'l fail in any 
 trouble with my way - I can always test key() way. 
 Maybe I'm still missing something ...  ( I should 
 be honest. I don't think I'm missing something, but 
 there is always a chance. )

 18. Well.... Now I should write to Sebastian that 
 pipe or 'one stylesheet' is a mythical distinction.
 If I have 'a | b' I can always write a1.xsl with the 
 structure of 

 <xsl:variable name="step1"/>
  <doc> 
  tranbsformation 1
  </doc>
 </xsl:variable>

 <xsl:apply-templates select="xt:node-set($step1)" mode = "tranformation2"/>

 <xsl:template match="/doc" mode="transformation2" >
 ...

 19. Well - in perl I can do the same. Create some hash, 
 pass it down the road... What is the *difference*.

 The difference is that in perl ( and any other language )
 I have access 'by value' and 'by pointer' ( or by reference, 
 but let me call it 'by pointer' ).

 In XSLT I have only 'access by value'. Java tried hard
 to kill the distinction between those 'access by pointer' 
 and 'access by value'. They ended with 'mostly access by 
 pointer, but not really'.

 The way Java did it results in the situation that when you 
 see foo( bar, baz ) you can  not tell will the bar or baz 
 be modified in the code, because those bar and baz could 
 be passed down the road e t.c.  ( With other languages 
 you at least can *guess* that if it is 'by pointer' it is 
 'to be modified'. Well - very hypothetical, because it is 
 also not true ;-) It at least gives some chance. Or you can 
 start utilizing the special notation - but this all is 
 very weak, I think. Anything based on 'notation' is weak, 
 a last chance try ....

 Hmmm... XSLT looks very strong here because ...

 WITH XSLT WE HAVE NO SUCH PROBLEMS. XSLT's 
 'weakness' ( lack of updateable variables ) is actually 
 *very* strong feature of XSLT. I was stupid not understanding 
 this for a *very* long time, but nobody told me the right 
 thing! 

 They were talking about 'side-effects', 'declarative 
 languages' e t.c. 

 But the point is  that XSLT is the language which 
 has only ACCESS BY VALUE, but no 'access by pointer', 
 because updating the variable is just a simple case of 
 ACCESS BY POINTER. 

 Well, maybe there were somebody saying it in this words, 
 but I can not remember this. The Bible is messy on this 
 topic, explaining some mythical usecases ( even the Bible 
 is in fact  trying to say about NO ACCESS BY POINTER 
 ( or by reference ) -  just using another words... )

 20. Why it is good  *not* to have access by pointer then?
 We are used to access by pointer and was Niklaus Wirth 
 an idiot? ( The answer is - of course he was *not*)

 Why we are used to access by pointer?

 I think it is because of this 'efficiency' thing. It is a 'roadsign' 
 for 'more efficient' ( 'memory-saving' ) internal dataflows.

 Consider the hypothetical situation when you are passing 
 the *entire* context of *entire* program to *every* function
 in some special way.  'fast-searchable global variable', 
 but not many 'prepared' variables. 'database instead 
 of variables'.  Like in SQL, for example - rows have no names.
 ;-)

 Do you need the access by pointer, if each of your functions
 has 'efficient way to 'search' for the knowledge' 
 which otherwise 'was accumulated in the appropriate variable' ?  
 
 The answer is : it looks that it is really possible for most 
 of cases to 're-do' some things constantly.

 Is it *less* efficient than accumulating knowledge in the 
 variables? Yes. For 'simple architecture' ( no parallelism ) 
 XSLT ( access by value only ) is *by design* less efficient 
 than any language which allows access-by-pointer.

 What the hell? Why should we use it if it is less efficient?

 Because of the same reason we can live without key().
 A bit ( even twice as ) non-efficient, but 'clean'.

 XSLT semantics allows writing the cleanest possible code 
 ( below there will be one more critical feature of XSLT 
 which allows that. ) 

 This is really funny. XSLT says : "use key() roadsign 
 to hack for speed" and on another hand the same XSLT says :
 "*don't* use 'access-by-pointer' to hack for speed."
  
 Could you please explain it again? 

 Well...

 At the moment I can show some things by example ( as I said, 
 Bible is actually talking about the similar stuff, it is just using 
 suspicious usecases - there should be better ones, but this 
 requires somebody with nice hardware background.  How good 
 will be parallelism for XSLT is questionable - this is another 
 long topic. I'm talking about the software part AKA clean code
 and 'current' hardware architecture. )


 21. The example is again the 'flat' -> Hierarchical 'converter' step2.
 count() is the 'fast recalculator'. "searcher of information" instead 
 of 'storer of information' ( updateable variable ).

 In the language with updateable variables I could ( on step 1 )
 iterate over the list and then store the number of members, 
 and then pass this information down the road. Because I can not 
 do that, I'm passing the 'entire content down the road' and 
 'when I need that number of members - I'm recalculating it'.

 There will always be overhead of those recalculations in XSLT,
 the question is 'is this overhead worth the clean code you get' ?
 
 My answer is "yes". I'm rejecting the key() for the sake of clear 
 code, so of course I'm rejecting the evil of 'access by pointer', 
 no matter that 'access-by-pointer' gives me yet another 
 ability for manual tuning of efficiency. I don't need 
 malloc(), I'm OK with garbage collector. This is all about the same.
 
 I'm betting on XSLT with clear understanding that XSLT will be 
 slower than any 'ordinary' language.  I know that XSLT 
 will be *more clean* in return. And also I know that 
 XSLT is for pipes and pipes are for XSLT.
 
 22. Why pipes are for XSLT ? 

 22.1 What are those 'pipes' ?

 First - please do not forget that pipes have almost *nothing* 
 to do  with the number of styleheets. Having it in multiple 
 stylesheets is just 'a bit cleaner and easier' ( For example, 
 to dump the intermediate dataflow I can just redirect it to the 
 file , ( not inserting <xsl:copy-of select ), a bit easier
 is to make per-node validation. Small thing here - small thing 
 there. Not a rocket science. 

 'pipe' is first of all a logical entity. 'Thinking pipes' 
 is very important UNIX skill which is rare in current world 
 of people with no math education. 

 To love pipes you should love math. Writing pipes 
 component - after - component is  like first 
 making lemma, then theorem then another ... and then 
 re-use a theorem. Soo cool. But this is also hard, yes.
 Not anybody gets the beauty of math. Well - people 
 are all different. I for example don't get the beauty of 
 chemistry, and I remember some developers who were 
 very good in chemistry in the university - they had 
 very special view on programming from my point of view ;-) 
 Actually some of them were very good developers, just 
 'different'. Education  has a huge impact, actually.
 
 Those who have no fun from math have no fun from pipes and 
 usually simply have no skills ( even they think they do have). 

 Collecting complex statistics from log files is a nice 
 task to get those skills. Not any UNIX activity helps 
 you to 'get it'. This is the reality, sorry if this 
 sounds strange to some of you with, say, VMS-only 
 background. ( Not bad thing to remember that ugly 
 unreliable UNIX have crushed reliable and accurate 
 VMS. Because of pipes. )

 Math is hard. Pipes are also hard. Not many people 
 get math. Not many people get pipes. Pipes is a 
 concept and it is not obvious concept.

 22.2. OK, stop it, this is all simple - we know this -
 show how 'thinking pipes' works with XSLT.

 Hm. First - have you mentioned the test6 and 
 'Flat puzzle' ?

 But pipes can do more than just 'serializing hashtable'
 and then 're-using the theorem'.

 Consider some statistical report. Let's say "Batches
 of checks processed on some box". 

 You get the number of complex records of unknown height 
 and you want to print a footer at the end of each page.
 This is all of course plain ASCII. So in 'normal' 
 language when outputting each new line to the printer, 
 I'm just  

 Nlines++;
 if ( NLines == PAGE_WIDTH ) { print_footer() }
 
 Not that simple with XSLT ;-)

 I think this actually looks darn hard and almost 
 unsolvable 'clearly',  if not 'thinking pipes'. 
 key() will not help here ;-)

 However, the solution is very much similar to those 
 used before. Just assume that there is another 
 stylesheet 'down the road' out there and print 
 the 'flat list'

 <line> content </line>
 <line> content </line>
 ....

 And then in the second stylesheet 'group' the number 
 of lines into 'page' exactly like test6 step 2 works - 
 just use select="list[ position() gt; $PAGE_WIDTH ]"

 See - the same grouping component again. And no 'keys'
 to worry about at all. Maybe this grouping component
 is a generic thing? Maybe there are more ? Yes and yes.

 In the presence of 'second stylesheet' you may realize 
 that you need some better balancing between first 
 stylesheet and second e t.c. e t.c. This will result 
 in 'clean' dataflow between the nodes. I mean only things 
 which should be passed will be passed ( remember - 
 there is no 'access by pointer', no evil pointers are 
 passed down the road ;-)

 Mind to compare this to what happens inside the typical 
 intranet written in Java / perl / C++ / whatever  ?

 XSLT really helps. 'Thinking pipes' really helps. 
 It is good for your code, it is good for your data, 
 it is good, because XSLT is 'closer' to UNIX pipes 
 than any other language. Why?

 Because of no 'access by pointer',  XSLT is 
 'pushing more and more content down the road not 
 looking back'.  Pardon - but this is the way how 
 UNIX pipe works! Both XSLT and UNIX pipes 
 are 'looking only ahead but not look back'.
 They are good for each other.
 
 22.3.  Why XSLT syntax is good for pipes.

 In fact every time you write <xsl:call-template 
 non-recursive and with 'simple parameters'
 you should think twice.

 WHAT?????

 Yes, you should.

 In the presence of the second stylesheet (transformation) 
 you can always write 

 <xsl:call-template name="foo" ... with param="bar>

 In the form of

 <FOO attr="bar"/>

 And then provide the <xsl:template match="FOO" 
 into second stylesheet. And look - what could be 
 better to read? Worth thinking every time, 
 actually.

  23. Heck - I can do the same with perl. I can 
 pass only hashes by value e t.c. What is special 
 about XSLT ? 

 ... Syntax ... You can not do 22.3, for example, 
 easily switching from 'this is data - this is code'
 Yes, I know, I know - you think you *can* with 
 Text::Template and things like, say, XPathScript.

 The truth is that you can not. Small thing here, 
 small thing there. XSLT is darn good. The problem 
 was XSMLish notation,but XSLTScript notation 
 always solved this problem for me.  Yes - no else
 and no auto-recursion. This is not the big deal 
 with XSLScript or other preprocessor. The core 
 of XSLT as a 'templatish dataflow language
 suited for piped transformations by design'
 It is very healthy.

 24. Forget all the crap about those 
 Access-by-pointer tricks. Think pipes and 
 dataflows. XSLT appears to be the first 
 language which forces clean dataflows -
 even it appears they were not understanding 
 what they really invented. This happens 
 very often.

 25. The challenge is still open. I'l be glad 
 to see something which really needs the 
 updateable variable. For a while I was 
 thinking  that 22.2 is the case, but I think 
 now it is clear that it is not the case 
 and the  

 Nlines++;

 if ( NLines == PAGE_WIDTH ) { print_footer() }

 Is a hack, but piped XSLT view is *better*.


 25. There should be some problems!!!

 Yes:

 1. Extensions. If not 'thinking pipes'. 
 If 'thinking pipes' - could be possible to 
 cooperate even with 'event driven GUI'. 
 Check Plan 9 for how to write GUI with 
 awk and do the same. Research required.
 
2. Speed. If not providing some smart 
 way to 'prebuild' some indexes - some 
 parts could be darn slow or become 
 horrible mess of key() statements.  
 It could be partially solved with 
 XSLScript introducing some meta-construction(s)
 for autogeneration bunch of 'key()' - 
 but this is a mess. It is better to 
 have it in a core. But this is questionable.
 Because nobody was thinking about 
 generation of key() out of Xpath expressions, 

 I doubt that current design or even syntax 
 of key() will survive 'the good desgin'.

 Research required.

Rgds.Paul.




 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]