This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Is it OK to write ASCII strings directly into locale source files?
* Carlos O'Donell:
> On 07/24/2017 01:05 PM, Florian Weimer wrote:
>> * Andreas Schwab:
>>
>>> On Jul 24 2017, Carlos O'Donell <carlos@redhat.com> wrote:
>>>
>>>> So let us start slowly and agree with 'ASCII - [<>]' where < denotes
>>>> the start of a code point and > the end of the code point.
>>>
>>> POSIX says "character in the portable character set" if you want to keep
>>> it portable.
>>
>> But our locales only have to be compatible with our localedef, right?
>
> Should developers be able to write tools to the POSIX locale spec and parse
> our source locale definitions? Supporting more than just GNU/Linux? Do the
> BSDs share our locale definitions?
No, they don't. For one thing, they have partially implemented %OB
(without fixing all the locales, creating inconsistencies).
> My only technical objection with writing straight UTF-8 is that it could
> lead to more mistakes, and Mike just found one in CLDR where an Arabic
> Farsi character was used incorrectly because it displayed the same glyph.
> It was caught when harmonizing with glibc where you have to write out the
> code points (Mike filed a bug upstream with CLDR).
Wasn't it caught by locale testing which revealed that the locale
wasn't compatible with ISO-8859-6? That sanity check would still
apply to locale definitions written in UTF-8.
If we are worried about this kind of problem, I think web browsers
have multi-script detection logic to deal with cross-script homographs
in IDNA labels. I don't know how hard it would be to extract that
logic from there and run it on locale strings, for cross-verification.
> My preference would be to start small, start using the POSIX portable
> character set to it's maximum extent for all latin-based languages,
I would still prefer the <U…> encoding for control characters which
are in the portable character set. So I have to object to the
“maximum” part. :)