This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
RE: Converting non-pure trees to pure trees
- To: "'xsl-list at mulberrytech dot com'" <xsl-list at mulberrytech dot com>
- Subject: RE: Converting non-pure trees to pure trees
- From: Kay Michael <Michael dot Kay at icl dot com>
- Date: Tue, 21 Nov 2000 10:00:37 -0000
- Reply-To: xsl-list at mulberrytech dot com
> I have a XML file which I have automatically converted from
> msword, the basic structure is:
>
> <worddocument>
> <p>paragraph <b>hello</b> <i>world</i></p>
> <p>paragraph <b>hello</b> <i>world</i></p>
> <p>paragraph <b>hello</b> <i>world</i></p>
> <pagebreak/>
> <p>2/1</p>
> <p>paragraph <b>hello</b> <i>world</i></p>
> <p>paragraph <b>hello</b> <i>world</i></p>
> <p>paragraph <b>hello</b> <i>world</i></p>
> <pagebreak/>
> <p>2/2</p>
> <p>paragraph <b>hello</b> <i>world</i></p>
> <p>paragraph <b>hello</b> <i>world</i></p>
> <p>paragraph <b>hello</b> <i>world</i></p>
> <worddocument/>
This is a grouping problem, of the kind I call "grouping by position".
Grouping problems in XSLT are not easy: for background, see
www.jenitennison.com.
All grouping problems require two nested loops. The outer loop selects a
representative element for each group, which in this case seems to be a <p>
element that is immediately preceded by a <pagebreak> element:
<xsl:for-each select="p[preceding-sibling::*[1][self::pagebreak]">
<mongraph id="{.}">
...
</mongraph>
</xsl:for-each>
Inside this you need an inner loop that processes all the elements within
one group. In this case these are "all the <p> elements that follow the
"representative" element, up to the next "representative" element. Or to put
it another way, all following <p> elements whose first preceding
<page-break> is the same as the first preceding <page-break> of the current
element.
So the inner loop can be:
<xsl:for-each select="following-sibling::p[
generate-id(preceding-sibling::page-break[1]) =
generate-id(current()/preceding-sibling::page-break[1])]"
<xsl:copy-of select="."/>
</xsl:for-each>
In Saxon there is a simpler solution using the saxon:leading() extension
function.
Mike Kay
>
> I wish to transform this tree using some knowledge I have
> about the document:
> The first page is always the "introduction", whilst all
> sebsequent pages are "monographs"
>
> <semanticdocument>
> <introduction>
> <p>paragraph <b>hello</b> <i>world</i></p>
> <p>paragraph <b>hello</b> <i>world</i></p>
> <p>paragraph <b>hello</b> <i>world</i></p>
> </introduction>
> <mongraphs>
> <mongraph id="2/1">
> <p>paragraph <b>hello</b> <i>world</i></p>
> <p>paragraph <b>hello</b> <i>world</i></p>
> <p>paragraph <b>hello</b> <i>world</i></p>
> </mongraph id="2/1">
> <mongraph id="2/2">
> <p>paragraph <b>hello</b> <i>world</i></p>
> <p>paragraph <b>hello</b> <i>world</i></p>
> <p>paragraph <b>hello</b> <i>world</i></p>
> </mongraph>
> </mongraphs>
> <semanticdocument/>
>
>
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list