This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

RE: Converting non-pure trees to pure trees


> I have a XML file which I have automatically converted from 
> msword, the basic structure is:
> 
> <worddocument>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> 	<pagebreak/>
> 	<p>2/1</p>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> 	<pagebreak/>
> 	<p>2/2</p>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> 	<p>paragraph <b>hello</b> <i>world</i></p>
> <worddocument/>

This is a grouping problem, of the kind I call "grouping by position".
Grouping problems in XSLT are not easy: for background, see
www.jenitennison.com.

All grouping problems require two nested loops. The outer loop selects a
representative element for each group, which in this case seems to be a <p>
element that is immediately preceded by a <pagebreak> element:

<xsl:for-each select="p[preceding-sibling::*[1][self::pagebreak]">
<mongraph id="{.}">

...

</mongraph>
</xsl:for-each>

Inside this you need an inner loop that processes all the elements within
one group. In this case these are "all the <p> elements that follow the
"representative" element, up to the next "representative" element. Or to put
it another way, all following <p> elements whose first preceding
<page-break> is the same as the first preceding <page-break> of the current
element. 

So the inner loop can be:

<xsl:for-each select="following-sibling::p[
                       generate-id(preceding-sibling::page-break[1]) =
 
generate-id(current()/preceding-sibling::page-break[1])]"
  <xsl:copy-of select="."/>
</xsl:for-each>

In Saxon there is a simpler solution using the saxon:leading() extension
function.

Mike Kay
> 
> I wish to transform this tree using some knowledge I have 
> about the document:
> The first page is always the "introduction", whilst all 
> sebsequent pages are "monographs"
> 
> <semanticdocument>
> 	<introduction>
> 		<p>paragraph <b>hello</b> <i>world</i></p>
> 		<p>paragraph <b>hello</b> <i>world</i></p>
> 		<p>paragraph <b>hello</b> <i>world</i></p>
> 	</introduction>
> 	<mongraphs>
> 		<mongraph id="2/1">
> 			<p>paragraph <b>hello</b> <i>world</i></p>
> 			<p>paragraph <b>hello</b> <i>world</i></p>
> 			<p>paragraph <b>hello</b> <i>world</i></p>
> 		</mongraph id="2/1">
> 		<mongraph id="2/2">
> 			<p>paragraph <b>hello</b> <i>world</i></p>
> 			<p>paragraph <b>hello</b> <i>world</i></p>
> 			<p>paragraph <b>hello</b> <i>world</i></p>
> 		</mongraph>
> 	</mongraphs>
> <semanticdocument/>
> 
> 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]