This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: encoding and XSL Transformation


Chuck White wrote at 10 Sep 2002 07:19:37 -0700:
 > Windows encodings within the range of 128-159 map out to a variety of
 > control characters in Unicode, so your problem begins with your source
 > document, not Xalan.

Don't automatically equate byte values with character numbers (i.e.,
code points).

Bytes in the range 128-159 when read as, say, ISO-8859-1 maps to a
variety of control characters.

Data in ISO-8859-1 when read as UTF-8 maps to a lot of junk, usually
with a lot of illegal byte sequences.  UTF-8 data read as UTF-16
undoubtedly reads as a lot of junk too.

Data in a Windows code page when read as a Windows code page (in an
XML context, when the encoding declaration specifies the right
encoding) reads as a variety of characters that have Unicode code
points that do not have a 1:1 correspondence with the numeric value of
the bytes used to represent the characters.

Regards,


Tony Graham
------------------------------------------------------------------------
XML Technology Center - Dublin                mailto:tony.graham@sun.com
Sun Microsystems Ireland Ltd                       Phone: +353 1 8199708
Hamilton House, East Point Business Park, Dublin 3            x(70)19708

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]