This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Question about iconv, UTF 8/16/32 and error reporting due to UTF-16 surrogates.


On 12/03/2015 10:44 PM, Rich Felker wrote:

> The relevant term is "Unicode Scalar Values", and these are exactly
> the integers 0-0xd7ff and 0xe000-0x10ffff. UTF's assign a unique
> encoding (in terms of code units) to each the Unicode Scalar Value,
> and are not defined for any other integers. Likewise, UCS (16 or 32)
> does not include values which are not Unicode Scalar Values.

The term Unicode Scalar Value did not exist when Unicode support was
added to glibc.  For example, all the reference I have readily at hand
(I can't find the 10646 CD right now) imply that UCS-4 in ISO/IEC
10646:2000 still had 31 bits and not the range restriction you gave.

The question is what glibc should doâimplement historic definitions,
preserving the meaning of charset names for backwards compatibility, or
tweak the implementations as the definitions evolve.

Florian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]