This is the mail archive of the xsl-list@mulberrytech.com mailing list .


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: International Characters in attributes



> That's partly why people still use encodings other than utf-8. And
> once you do, the same numeric character references will mean different
> things in different encodings,

No, in XML and in HTML (4+) a numeric character reference always refers
to  the unicode position. It does not refer to th eposition in the
current encoding.

This is why I say that if your markup only uses ascii characters (as in
XHTML) the you can encode your file in any encoding (eg us-ascii)
without loss of information, as all characters are accessible via
numeric references.

The rules for XML are rather different than the rules for HTML (and the
rules for HTML changed over time as it migrated from latin1 to unicode
as its character repertoire). But for an XML system at least, it is
clear that any system that claims to be xml conforming has to accept
UTF8.

(It isn't only Asian languages that you mention that use "long" utf8
byte sequences, Unicode 3.1 promises to add around a thousand
mathematical aphanumeric symbols into plane 1, and these will be used by
MathML systems)

David

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]