This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: encoding and XSL Transformation
Chuck White wrote at 10 Sep 2002 07:19:37 -0700:
> Windows encodings within the range of 128-159 map out to a variety of
> control characters in Unicode, so your problem begins with your source
> document, not Xalan.
Don't automatically equate byte values with character numbers (i.e.,
code points).
Bytes in the range 128-159 when read as, say, ISO-8859-1 maps to a
variety of control characters.
Data in ISO-8859-1 when read as UTF-8 maps to a lot of junk, usually
with a lot of illegal byte sequences. UTF-8 data read as UTF-16
undoubtedly reads as a lot of junk too.
Data in a Windows code page when read as a Windows code page (in an
XML context, when the encoding declaration specifies the right
encoding) reads as a variety of characters that have Unicode code
points that do not have a 1:1 correspondence with the numeric value of
the bytes used to represent the characters.
Regards,
Tony Graham
------------------------------------------------------------------------
XML Technology Center - Dublin mailto:tony.graham@sun.com
Sun Microsystems Ireland Ltd Phone: +353 1 8199708
Hamilton House, East Point Business Park, Dublin 3 x(70)19708
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list