This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Improved check-localedef script


Zack Weinberg <zackw@panix.com> wrote:

> On Thu, Aug 3, 2017 at 5:17 PM, Zack Weinberg <zackw@panix.com> wrote:
>> Here is an improved version of the check-localedef script I posted the
>> other week.
>
> Here is another revision which uses the SUPPORTED file to learn the
> legacy encodings for each locale, rather than looking at %Charset:
> annotations in the source files.  You run it like this now (from the
> top level of the source tree):
>
> $ ./scripts/check-localedef.py -p localedata/locales -f
> localedata/SUPPORTED localedata/locales/*
>
> The final "localedata/locales/*" part is not _required_; it only
> enables the script to tell you about any locales that are missing from
> the SUPPORTED file.
>
> (Also, still more bugs have been fixed; in particular the
> "inappropriate character" errors have been restored.  Doh.)
>
> It's possible that Python isn't going to work out as the
> implementation language for this script.  I used it because its
> standard library provides Unicode normalization and many codecs for
> legacy encodings, but it doesn't know all of the encodings mentioned
> in localedata/SUPPORTED (ARMSCII-8, GEORGIAN-PS, and EUC-TW are
> missing) and I don't think it knows how to do transliteration, either.
> And it's still a solid order of magnitude slower than it should be.

    localedata/locales/uz_UZ:212: string not representable in iso8859-1:
        0073 006F 02BB 006D
    
That is “soʻm” where the 3rd character is U+02BB MODIFIER LETTER TURNED COMMA.
In the Latin1 version of the uz_UZ locale this gets transliterated
into U+0027 APOSTROPHE:

    $ LC_ALL=uz_UZ.ISO-8859-1 locale -k currency_symbol
    currency_symbol="so'm"

It looks like most of the “string not representable” warnings are false
positives.

-- 
Mike FABIAN <mfabian@redhat.com>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]