This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Transforming HTML to NITF
- To: "XSL Mailing List (E-mail)" <xsl-list at lists dot mulberrytech dot com>
- Subject: [xsl] Transforming HTML to NITF
- From: Adam Van Den Hoven <Adam dot Hoven at bluezone dot net>
- Date: Fri, 16 Feb 2001 15:23:26 -0800
- Reply-To: xsl-list at lists dot mulberrytech dot com
Since the body of NITF (News Industry Text Format, a standard format for
News content) is alot like HTML (in the simplest form), I'm allowing my
users to create NITF using an HTML parser. I then pass the HTML through HTML
Tidy to make it well formed XML and then through an XSL to make it NITF.
I have come across a problem that I dont know how to fix and I need the
communities help.
the NITF has a <content.body> tag which is equivilant to HTMLs <body> tag.
However, its children are far more rigidly defined in that it only allows
elements as children. For my purposes, I'm allowed <p> <table> <ul> and <ol>
tags (there are others but we don't use them yet).
After passing the HTML through HTML Tidy, I might get something like:
<body>
<p> this is some text</p>
<ul>
<li>item 1</li>
</ul>
this is <em>emphasis</em> some more <b>text</b></br></br>
<p>This is a new paragraph</p>
</body>
This would occur if I started with:
<body>
<p> this is some text
<ul>
<li>item 1</li>
</ul>
this is <em>emphasis</em> some more <b>text</b></p>
<p>This is a new paragraph</p>
</body>
> I need to get the line:
this is <em>emphasis</em> some more <b>text</b></br></br>
> to end up wrapped in <p> tags (preferably without the <br>s)
>
> For clarity, the children of the body are:
p
ul
| text()
| em
| text()
| b
| br
| br
p
> I need to work with thos tags that have the | beside them as a single
> block so that I can wrap the entire thing in a <p> tag. Since I don't know
> the placement or the order or even the frequency of such situations (there
> is no reason why I couldn't have more blocks that need to be grouped
> together). The solution needs to be general.
>
> I really don't want to have to use scripting but if the best solution
> requires it, I'm running MSXML 3.
>
>
> Adam van den Hoven
> Internet Application Developer
> Blue Zone
> tel. 604.685.4310
> fax. 604.685.4391
> Blue Zone makes you interactive.(tm) http://www.bluezone.net/
>
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list