This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Splitting an XML file based on size
- To: "XSL Mailing List (E-mail)" <xsl-list at lists dot mulberrytech dot com>
- Subject: [xsl] Splitting an XML file based on size
- From: Adam Van Den Hoven <Adam dot Hoven at bluezone dot net>
- Date: Tue, 3 Apr 2001 15:50:04 -0700
- Reply-To: xsl-list at lists dot mulberrytech dot com
Hey guys,
I'm processing an NITF file into HTML. NITF is very much like HTML in that
it has a body with paragraph tags that has mixed content. The HTML that I am
creating from my tranforms can quickly become several tens of kb in size.
Since I'm transfering this over a wireless modem to a PocketPC at a maximum
of 14.4 kbs, an HTML file that is 15kb is entirely too big.
I need some way to keep track of the number of characters I've processed and
stop when I reach a specific size, stoping at the end of the paragraph. I
understand that counting characters is not very precise but I am only
interested in getting the transfer size to be less than 2K or so.
As an example, I might have the following NITF code:
<nitf baselang="en.ca">
<head><!-- Header Metadata here --></head>
<body>
<body.head><!-- Body head stuff here --></body.head>
<body.content>
<p>
Lorem ipsum dolor sit amet,
<em>consectetuer adipiscing elit, sed diem</em>
nonummy nibh euismod tincidunt ut lacreet dolore magna aliguam
erat volutpat.
</p>
<p>
Lorem ipsum
<q>dolor sit amet, consectetuer adipiscing elit,</q>
sed diem nonummy nibh euismod tincidunt ut lacreet dolore magna
aliguam erat volutpat.
</p>
<p>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed
diem
<em>nonummy nibh euismod </em>
tincidunt ut lacreet dolore magna aliguam erat volutpat.
</p>
<p>
Lorem ipsum dolor sit amet,
<em>consectetuer adipiscing elit, </em>
sed diem nonummy nibh euismod tincidunt ut lacreet dolore magna
aliguam erat volutpat.
</p>
<p>
Lorem ipsum dolor sit amet,
<q>consectetuer adipiscing elit,</q>
sed diem nonummy nibh euismod tincidunt ut lacreet dolore magna
aliguam erat volutpat.
</p>
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed
diem nonummy nibh euismod tincidunt ut lacreet dolore magna aliguam erat
volutpat. </p>
</body.content>
<body.end><!-- tagline here --></body.end>
</body>
</nitf>
The text there happens to be nearly 500 characters. Lets say that my target
size is 375 characters. That should be "o" in "euismod" in the third <p>
tag. Normally I would create:
<html>
<head><!-- Header Metadata here --></head>
<body>
<p>
Lorem ipsum dolor sit amet,
<em>consectetuer adipiscing elit, sed diem</em>
nonummy nibh euismod tincidunt ut lacreet dolore magna aliguam
erat volutpat.
</p>
<p>
Lorem ipsum
<q>dolor sit amet, consectetuer adipiscing elit,</q>
sed diem nonummy nibh euismod tincidunt ut lacreet dolore magna
aliguam erat volutpat.
</p>
<p>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed
diem
<em>nonummy nibh euismod </em>
tincidunt ut lacreet dolore magna aliguam erat volutpat.
</p>
<p>
Lorem ipsum dolor sit amet,
<em>consectetuer adipiscing elit, </em>
sed diem nonummy nibh euismod tincidunt ut lacreet dolore magna
aliguam erat volutpat.
</p>
<p>
Lorem ipsum dolor sit amet,
<q>consectetuer adipiscing elit,</q>
sed diem nonummy nibh euismod tincidunt ut lacreet dolore magna
aliguam erat volutpat.
</p>
<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed
diem nonummy nibh euismod tincidunt ut lacreet dolore magna aliguam erat
volutpat. </p>
</body>
</html>
but what I want to create is:
<html>
<head><!-- Header Metadata here --></head>
<body>
<p>
Lorem ipsum dolor sit amet,
<em>consectetuer adipiscing elit, sed diem</em>
nonummy nibh euismod tincidunt ut lacreet dolore magna aliguam
erat volutpat.
</p>
<p>
Lorem ipsum
<q>dolor sit amet, consectetuer adipiscing elit,</q>
sed diem nonummy nibh euismod tincidunt ut lacreet dolore magna
aliguam erat volutpat.
</p>
<p>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed
diem
<em>nonummy nibh euismod </em>
tincidunt ut lacreet dolore magna aliguam erat volutpat.
</p>
<p><a href="someURL">View Entire story</a></p>
</body>
</html>
> I can't be so coarse as counting paragraphs since I might also have a
> table (essentially an HTML table) or lists or something. Some paragraphs
> will be as short as a single sentance, others will be much longer.
>
> I also need to do some additional processing after I reach the end of the
> NITF text (but the size of those will be much more rigid and simply
> subtracted from the target filesize).
>
> I had thought about doing something approximately like:
>
> <xsl:template match="p" mode="block">
> <xsl:param name="cursize" select="0">
> <xsl:variable name="size" select="$cursize" />
> <p>
> <xsl:apply-templates select="child::node()" mode="inline">
> <xsl:with-param name="cursize" select="$size + 7" />
> <!-- +7 characters for the tags -->
> </xsl:apply-templates>
> </p>
> <xsl:if test="$size <= 400">
> <xsl:apply-templates match="followingsibling::p[1]"
> mode="block"/>
<xsl:with-param name="cursize" select="$size"
</xsl:apply-templates>
> </xsl:if>
> </xsl:template>
>
> but clearly that isn't going to work. I also assume that making a global
> variable called $size wouldn't work either.
>
> I am getting the feeling that this isn't strictly possible with XSL. I am
> using MSXML 3 so scripting might be a solution but I am loath to use it
> unless I have to.
>
> Adam van den Hoven
> Internet Application Developer
> Blue Zone
> tel. 604.685.4310
> fax. 604.685.4391
> Blue Zone makes you interactive.(tm) http://www.bluezone.net/
>
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list