This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [Patch 0/13] [BZ #14095] update collation data from Unicode / ISO 14651
- From: Carlos O'Donell <carlos at redhat dot com>
- To: Mike FABIAN <mfabian at redhat dot com>, libc-alpha at sourceware dot org
- Date: Fri, 26 Jan 2018 10:14:59 -0800
- Subject: Re: [Patch 0/13] [BZ #14095] update collation data from Unicode / ISO 14651
- Authentication-results: sourceware.org; auth=none
- References: <s9d4ln8q4f0.fsf@taka.site>
On 01/26/2018 02:51 AM, Mike FABIAN wrote:
>
> This set of patches updates our
> glibc/localedata/locales/iso14651_t1_common file to the latest
> available version from ISO and adapts the collation rules of all
> locales using “copy "iso14651_t1"” to the changes in the new file.
>
> The ISO standard 14651:2016 is available here:
What about ISO/IEC 14651:2016/Amd.1:2017?
It looks like it updates things to Unicode 9.0?
In particular ISO14651_2017_TABLE1_en.txt matches Amd.1:2017, and
*not* the 2016 version.
> ISO/IEC 14651:2016: https://www.iso.org/standard/68309.html
>
> And a POSIX style LC_COLLATE file is downloadable from:
>
> http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html
> http://standards.iso.org/ittf/PubliclyAvailableStandards/c068309_ISO_IEC_14651_2016.zip
>
> This .zip file contains a ISO14651_2017_TABLE1_en.txt which is in a
> similar format as our current iso14651_t1_common and can be used as an
> update.
>
To be clear, the text file is not in the above zip, it is in the associated
"Eletronic inserts" zip file which is part of the published standard.
http://standards.iso.org/ittf/PubliclyAvailableStandards/c068309_ISO_IEC_14651_2016_Electronic_inserts.zip
With this additional zip file you can review the tabular data to make
comparisons and review the patches.
> That file is unfortunately up-to-date only with Unicode 8.0.0,
> but that is already a huge improvement over what we have now.
This doesn't seem correct given the data in Amd.1:2017:
~~~
The current Common Template Table reflects the repertoire of characters of Unicode 9.0, included in
ISO/IEC 10646:2014 plus its Amendments 1 and 2, plus 273 new characters that will be included in the
fifth edition of ISO/IEC 10646.
~~~
> Also, that file contained some errors which needed to be fixed.
> Seems strange for a file release by ISO, but it really contained
> some errors.
>
> And as the names for most collation symbols have been changed, all the
> collation rules of locales using “copy "iso14651_t1"” needed to be
> updated.
>
> While doing that, I made the collation rules of all locales I touched
> agree with the CLDR collation rules. glibc has several locales which are
> not in CLDR, for these I just adapted the existing rules.
In summary:
* Can we get clarification of exactly which standard we are update to?
Is it just ISO/IEC 14651:2016 or ISO/IEC 14651:2016/Amd.1:2017?
--
Cheers,
Carlos.