This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH V4][BZ #18441] fix sorting multibyte charsets with an improper locale


On Tue, 29 Mar 2016, Carlos O'Donell wrote:

> I believe this is technically inaccurate since it allows all 4-byte
> sequences, when in reality the limit is at U+10FFFF?

That glibc accepts UTF-8 according to the definition in the 2003 edition 
of ISO 10646 rather than the definition in the 2011 and later editions is 
a known issue.  I've filed bug 19883 for it since I couldn't find an 
existing bug report in Bugzilla.  I don't think it's particularly relevant 
to any patch not aiming to fix that bug, but:

> You need not fix it, but we should add a comment saying that for the
> sake of simpler code we're allowing those 4-byte sequences which are
> not normally accepted.

I'd think a reference to this code in bug 19883 might be more useful - or 
something in that bug giving a standard (greppable) wording for a comment 
identifying places needing updating for the current UTF-8 (or in some 
cases UCS-4) definition, with such a comment added in this code.

-- 
Joseph S. Myers
joseph@codesourcery.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]