This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: encoding and XSL Transformation
----- Original Message -----
From: "Earl Bingham" <earl@righton.com>
>I'm using emacs and Internet Explorer 6.0 to view the output. Anyway, it is
>the decimal representation that I want as the output. I am working with XML
>that is being generated using C++ MFC and the Xerces C++ parser.
>What would you suggest I do to convert these characters to their decimal
>reference?
Based on your question, I have to assume your XML is being generated by a
Win 1252 app using the 1252, or Windows ANSI, code page as its encoding. So
you need to find a utility for converting the char references that are
mapped to I guess what you could call Unicode unknowns, since in Unicode
they're mapping to what are essentially private or unused code points. This
means that if you are using, for example, ’ , unless you do your
transform using an output encoding of Windows-1252, the UTF-8 mapping of
’ will not be what you bargained for.
’ maps out to the right single quote mark in Win 1252, i acute in
MacRoman, and, for our purpsoses here, nothing in Unicode (actually, it's a
"private use" control character).
I have to think MS has a conversion utility for fixing the source doc. Try
looking beginning here, and drill down. I bet you'll find something:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicod
e_0a5v.asp
Failing that, you can look at this conversion table and try your own thing:
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
Cheers,
Charles White
The Tumeric Partnership
http://www.tumeric.net
chuck@tumeric.net
http://www.javertising.com
________________________________________
Author, Mastering XSLT, Sybex Books
Co-Author, Mastering XML, Premium Edition, Sybex Books
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list