This is the mail archive of the xsl-list@mulberrytech.com mailing list .

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: CJK UTF-16 test

To: xsl-list at lists dot mulberrytech dot com
Subject: Re: [xsl] CJK UTF-16 test
From: David_N_Bertoni at lotus dot com
Date: Thu, 29 Mar 2001 11:29:48 -0500
Reply-To: xsl-list at lists dot mulberrytech dot com


> On Wed, 28 Mar 2001, David Carlisle wrote:
>
> >
> > > as I don't have any parser that will swallow UTF-16.
> >
> > utf-16 support is _mandated_ by the XML spec. If you have anything that
> > calls itself an XML parser it must be able to read utf-16.
>
> XML does NOT support UTF-16 since UTF-16 includes the surrogates - that
is
> in fact what *distinguishes* it from UCS-2. That the XML 1.0 spec ('scuse
> me, 'Recommendation') *says* that it requires support for UTF-16 is in
> fact an error in the text since it explicitly forbids surrogates (aka
> UTF-16) in the allowed char range spec. It is like saying 'We require
> Japanese support, except you can't use *any* Japanese.' It's a nonsense
> statement.
>
>   "Character Range
>
>    [2]   Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
>                   [#x10000-#x10FFFF]
>
>   /* any Unicode character, excluding the surrogate blocks, FFFE,
>   and FFFF. */
>
>          The mechanism for encoding character code points into bit
>   patterns may vary from entity to entity. All XML processors must accept
>   the UTF-8 and UTF-16 encodings of 10646;
>                    ^
>                    |
>            The Error. What it actually requires is a
>            specifified subset of UTF-8 and UCS-2 encodings.
>

You're confusing Unicode characters with how those characters are encoded.
UTF-16 uses surrogates, which are introduced by a value that is not a valid
Unicode character.  However, taken together, the pair represents a valid
Unicode character.  The character range refers to Unicode characters, not
their values in any given encoding.

> --
> Benjamin Franz
>
> "Real programmers can write assembly code in any language."
>                                  -- Larry Wall

Dave


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Follow-Ups:
- Re: CJK UTF-16 test
  - From: Michael Beddow

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]