This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: RFC: locale-source validation script


On Wed, Jul 26, 2017 at 9:35 AM, Mike FABIAN <mfabian@redhat.com> wrote:
> Andreas Schwab <schwab@suse.de> wrote:
>
>> On Jul 25 2017, Zack Weinberg <zackw@panix.com> wrote:
>>
>>> - There are quite a few strings that aren't NFC and I suspect it's
>>> going to take expert knowledge of the languages involved to tell if
>>> that's desirable.
>>
>> I don't think NFC or not has anything to do with the language.
>
> I think not all occurrences of non NFC are necessarily an error,
> for example de_DE contains:
>
>     LC_CTYPE
>     copy "i18n"
>
>     translit_start
>
>     include "translit_combining";""
>
>     % German umlauts.
>     % LATIN CAPITAL LETTER A WITH DIAERESIS.
>     <U00C4> "<U0041><U0308>";"<U0041><U0045>"
>              ^^^^^^^^^^^^^^ NFD but this is apparently on purpose

Right, this is the sort of thing I was thinking of, where we want to
make sure to treat NFC and NFD forms of the same construct the same
for classification or collation or whatever.  Another case is

localedata/locales/as_IN:140: string not normalized:
  source: 09B9 09DF
     nfc: 09B9 09AF 09BC

That's the 'yesstr'.  U+09DF is BENGALI LETTER YYA, which is a
_noncanonical_ composition of U+09AF BENGALI LETTER YA with U+09BC
BENGALI SIGN NUKTA.  The composed and decomposed forms render the same
on _my_ terminal, but maybe they don't on the terminals that the
community of Assamese speakers tends to use, or the decomposed form
doesn't convert properly to whatever the legacy encoding for this
language is (there is no %Charset: annotation in this file).
Regardless, we can't change it without doing some research first.

zw


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]