This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: Selective escaping of special characters


You could get a minor saving by taking out the test with three calls on
contains() and instead testing whether the string after translation is the
same as before.

Since you're using Saxon, you might get a benefit by using saxon:tokenize()
in place of the recursive call. Or just bite the bullet and write a Java
extension function that does the whole job.

Michael Kay
Software AG
home: Michael.H.Kay@ntlworld.com
work: Michael.Kay@softwareag.com

> -----Original Message-----
> From: owner-xsl-list@lists.mulberrytech.com
> [mailto:owner-xsl-list@lists.mulberrytech.com]On Behalf Of
> Kyrre Wathne
> Sent: 12 March 2002 13:22
> To: XSL-List@lists.mulberrytech.com
> Subject: [xsl] Selective escaping of special characters
>
>
> My apologies if this question has been asked before, I
> haven't found posts
> that address this exact issue.
>
> My problem is that I want to transform junk HTML generated by
> Microsoft
> Word. This contains markup, of course, so my first instinct was to use
> disable-output-escaping. However, this also disables escaping of other
> special characters, like the special dash character –.
> These are then
> outputted in a format my browser (Internet Explorer) doesn't
> understand (I
> use "ISO-8859-1" as encoding in output).
>
> I did work out a fix (pasted below) using a recursive named
> template, but
> this is proving too slow for all but the smallest documents.
> (I use Saxon
> 6.5.1.)
>
> My question is then: is there a fast way to only disable
> escaping for "<",
> ">" and "&"? Alternatively, can the named template below be optimized
> significantly?
>
> Thanks for any help.
>
> Kyrre Wathne
>
>
>
> <!-- Named template to output markup while escaping special
> characters -->
>
> <xsl:template name="DUMP_TAG_STRING">
>   <xsl:param name="str"/>
>   <xsl:choose>
>   <xsl:when test="not($str)">
>     <!-- Empty String -->
>   </xsl:when>
>   <xsl:when test="not(contains($str, '&lt;')) and not(contains($str,
> '&gt;')) and not(contains($str, '&amp;'))">
>     <!-- My work is done -->
>     <xsl:value-of select="$str"/>
>   </xsl:when>
>   <xsl:otherwise>
>       <!-- Convert all XML markup characters temporarily to
> the backspace
> character -->
>       <xsl:variable name="escaped" select="translate($str,
> '&lt;&gt;&amp;',
> '&#9224;&#9224;&#9224;')"/>
>       <xsl:variable name="cutPos" select="1 +
> string-length(substring-before($escaped, '&#9224;'))"/>
>       <!-- All but last letter -->
>       <xsl:variable name="before" select="substring($str, 1,
> $cutPos - 1)"/>
>       <!-- Last letter -->
>       <xsl:variable name="replace" select="substring($str,
> $cutPos, 1)"/>
>       <!-- Find the string after before -->
>       <xsl:variable name="after" select="substring($str,
> $cutPos + 1)"/>
>         <!-- Dump part before match -->
>         <xsl:value-of select="$before"/>
>         <!-- Dump &lt; or &gt; as is, unescaped -->
>         <xsl:value-of select="$replace"
> disable-output-escaping="yes"/>
>         <xsl:if test="$after">
>         <!-- Recurse with remainder -->
>         <xsl:call-template name="DUMP_TAG_STRING">
>           <xsl:with-param name="str" select="$after"/>
>         </xsl:call-template>
>         </xsl:if>
>     </xsl:otherwise>
>     </xsl:choose>
> </xsl:template>
>
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]