This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

RE: Converting &, >, <, ", and other odd-ball characters...


Yep..I took the routine Mike posted, and added a simple thing that checks if
the index of any of the 4 characters is in the string, then proceed,
otherwise just return the string. The indexOf() routines are pretty
effecient I hear..as they operate on the internal array directly. I think
this way, it first checks to see if any of the 4 characters are in it. If
so, it then does a new string buffer, then looks at every character. I even
think it can be faster..in that if there is at least one character, instead
of looking at every character, it simply locates the position of one of
those characters, copying the original string into the buffer up until that
point, the adding in the converted character, then locating the next
position, etc. Do you think that would be faster than doing a full loop
through every character?


> -----Original Message-----
> From: owner-xsl-list@lists.mulberrytech.com
> [mailto:owner-xsl-list@lists.mulberrytech.com]On Behalf Of Michael Kay
> Sent: Thursday, February 15, 2001 11:29 AM
> To: xsl-list@lists.mulberrytech.com
> Subject: RE: [xsl] Converting &, >, <, ", and other odd-ball
> characters...
>
>
> Well, I'd say your code is still pretty inefficient. You create a new
> StringBuffer() on each call, even when it's not needed. Take a look at the
> code in Saxon's XMLEmitter.
>
> Mike Kay
>
> > -----Original Message-----
> > From: owner-xsl-list@lists.mulberrytech.com
> > [mailto:owner-xsl-list@lists.mulberrytech.com]On Behalf Of
> > Kevin Duffey
> > Sent: 15 February 2001 16:28
> > To: xsl-list@lists.mulberrytech.com
> > Subject: RE: [xsl] Converting &, >, <, ", and other odd-ball
> > characters...
> >
> >
> > Thanks Mike.  I did something similar. Basically, I created a
> > static final
> > method like so:
> >
> >
> >
> >   private static final String[] HTMLChars   = new
>
[]{ 
> > "&", "\"", "<",
> > ">"};
> >   private static final String[] HTMLRepl    = new String[]{"&amp;",
> > "&quot;", "&lt;", "&gt;"};
> > 
> >   public static final String decodeHtml(String value)
> >   {
> >     return decode(value, HTMLChars, HTMLRepl);
> >   }
> > 
> > 
> >   public static final String decode(String value, String[] 
> > chars, String[]
> > repl)
> >   {
> >     // return null if the value, chars[], repl[] are null or 
> > the number
> >     // of elemetns of the chars[] and repl[] are not the same.
> >     if( value == null || chars == null || repl == null || 
> > chars.length !=
> > repl.length )
> >       return null;
> > 
> >     int sze = chars.length;
> >     StringBuffer sb = new StringBuffer(value);
> >     for( int cntr = 0; cntr < sze; cntr++ )
> >     {
> >       int curPos = 0;
> >       int oldPos = 0;
> > 
> >       while( (curPos = sb.toString().indexOf(chars[cntr], 
> > oldPos)) > -1 )
> >       {
> >         // found a match, so replace this occurrence of the string
> >         // with the same element in the repl[] array
> >         sb.replace(curPos, curPos + chars[cntr].length(), repl[cntr]);
> >         oldPos = curPos + chars[cntr].length() + 1;
> >       }
> >     }
> > 
> >     return sb.toString();
> >   }
> > 
> > 
> > This method works, so long as the first char[] is '&'. I am 
> > not sure if this
> > is as fast though..so I think I am going to use yours (and 
> > another person
> > sent me a similar routine) I get the feeling the 
> > sb.toString() call on each
> > iteration is slower than just looking at every character one 
> > at a time. I
> > thought the sb.charAt() was slow? That is why I opted for using the
> > indexOf() search, as I read that it is very effecient. I'll keep both
> > routines and one day to a performance analysis of them. This 
> > is definitely
> > one method that needs to be all it can be, since it will be 
> > needed on almost
> > every page and every form.
> > 
> > Thanks.
> > 
> > > -----Original Message-----
> > > From: owner-xsl-list@lists.mulberrytech.com
> > > [mailto:owner-xsl-list@lists.mulberrytech.com]On Behalf Of 
> > Mike Brown
>
 > > Sent: Wednesday, February 14, 2001 10:29 PM
> > > To: xsl-list@lists.mulberrytech.com
> > > Subject: Re: [xsl] Converting &, >, <, ", and other odd-ball
> > > characters...
> > >
> > >
> > > Duffey, Kevin wrote:
> > > > I am about to write a java routine that is called by every
> > > single field of
> > > > every jsp page just to convert possible ", >, < and & as well
> > > as check for
> > > > some other characters and strip them (such as an MS Word
> > paste that uses
> > > > bullets or the " " characters that use special codes for them).
> > >
> > > I will infer from this that you are using your JSPs to make XML that
> > > contains strings obtained from HTML form data.
> > >
> > > > I am not sure which way to go though. Is there a way to
> > > automatically have
> > > > XML and/or XSL convert these characters for me?
> > >
> > > No, XSLT is only able to work with XML documents that made
> > it through a
> > > parser. And you'll find that string substitution in XSLT is
> > nearly as
> > > painful as it is in Java.
> > >
> > > You must always escape the attribute values. You can get
> > around the need
> > > to escape character data content of an element by using
> > CDATA sections,
> > > but I think you'll find that it's actually just as easy to escape
> > > everything. Entities aren't going to help you.
> > >
> > > Also note that you
>  can put your Java method in your JSP.
> > > The following code is untested, but you get the general idea.
> > >
> > > <%!
> > >
> > >     // at times like these, perl would be beautiful
> > >     private String escape( String s ) {
> > >         StringBuffer sb = new StringBuffer();
> > >         for ( int i = 0; i < s.length(); i++ ) {
> > >             switch ( s.charAt(i) ) {
> > >                 case '&': sb.append("&amp;");
> > >                           break;
> > >                 case '<': sb.append("&lt;");
> > >                           break;
> > >                 case '>': sb.append("&gt;");
> > >                           break;
> > >                 default: sb.append( s.charAt(i) );
> > >             }
> > >         }
> > >         return sb.toString();
> > >     }
> > >
> > > %>
> > >
> > > ...
> > >
> > > <%
> > >    String somexml = new String( "<stuff>" +
> > > escape(getParameter("foo")) + "</stuff>" );
> > > %>
> > >
> > >    - Mike
> > > ____________________________________________________________________
> > > Mike J. Brown, software engineer at            My XML/XSL resources:
> > > webb.net in Denver, Colorado, USA              http://skew.org/xml/
> > >
> > >
> > >  XSL-List info and archive:
> http://www.mulberrytech.com/xsl/xsl-list
> >
> >
>
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>
>
>  XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
>


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]