This is the mail archive of the
xsl-list@mulberrytech.com
mailing list .
Re: How to read the encoding of an XML document
[James Garriss]
> At 08:30 PM 10/25/2001 +0100, David Carlisle wrote:
> >> If I no longer know what my original XML document was encoded as, how
do I
> > > know the appropriate encoding set to specify for the output?
> >
> >every xml application is mandated to support at least the utf8 and utf16
> >encodings, so either of those is always appropriate (or at least
> >acceptable) whatever the original encoding of the file.
>
> Ok. If you recall, I started this discussion by mentioning that I am
> receiving XML documents from several European countries. So the pertinent
> question for me is "if UTF-8 and/or UTF-16 will be the output encoding set
> I must use, will they handle charcters from the languages I care about?"
>
> I found this statement on unicode.org:
>
> "What Characters Does the Unicode Standard Include? The Unicode Standard
> defines codes for characters used in the major languages written today.
> Scripts include the European alphabetic scripts, Middle Eastern
> right-to-left scripts, and scripts of Asia."
>
> So it seems to me that I should be safe outputing my data to UTF-16. That
> make sense?
>
Yes. At least, any xml processor would be able to handle either utf-8 or
utf-16. What may be displayed by a browser or word processor, though (if
you transform it into a displayable document), is another question. utf-8
might be a better choice depending on what is going to consume it.
Cheers,
Tom P
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list