This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Should glibc provide a builtin C.UTF-8 locale?


On Thu, Oct 22, 2015 at 12:01:32PM -0400, Mike Frysinger wrote:
> On 22 Oct 2015 11:13, Rich Felker wrote:
> > On Wed, Oct 21, 2015 at 01:49:36PM -0400, Mike Frysinger wrote:
> > > i've created a C.UTF-8 page where i've tried to gather all the points
> > > people made in this thread:
> > > 	https://sourceware.org/glibc/wiki/Proposals/C.UTF-8
> > 
> > On the wiki I see, under differences from C:
> > 
> > - LC_COLLATE: Sort using the Unicode codepoint 
> > 
> > But this does not seem to be a difference. Unicode codepoint order is
> > identical to UTF-8 code unit order as unsigned char, i.e. the same as
> > the C locale.
> 
> i was thinking of overlong encodings, but i guess those are technically
> invalid according to the spec.  i think it's still worth calling out in
> the doc, but we can include an aside that highlights things.
> -mike

Those are just one case of non-UTF-8 sequences. I would assume you'd
still want them to sort like they would in the C locale just to
preserve the total order. Having all illegal sequences compare equal
to each other, or having them compare equal to valid sequences, would
be problematic for users and more work to implement.

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]