This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch 0/13] [BZ #14095] update collation data from Unicode / ISO 14651


On 01/26/2018 02:51 AM, Mike FABIAN wrote:
> 
> This set of patches updates our
> glibc/localedata/locales/iso14651_t1_common file to the latest
> available version from ISO and adapts the collation rules of all
> locales using “copy "iso14651_t1"” to the changes in the new file.
> 
> The ISO standard 14651:2016 is available here:

What about ISO/IEC 14651:2016/Amd.1:2017?

It looks like it updates things to Unicode 9.0?

In particular ISO14651_2017_TABLE1_en.txt matches Amd.1:2017, and
*not* the 2016 version.

> ISO/IEC 14651:2016: https://www.iso.org/standard/68309.html
> 
> And a POSIX style LC_COLLATE file is downloadable from:
> 
> http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html
> http://standards.iso.org/ittf/PubliclyAvailableStandards/c068309_ISO_IEC_14651_2016.zip
> 
> This .zip file contains a ISO14651_2017_TABLE1_en.txt which is in a
> similar format as our current iso14651_t1_common and can be used as an
> update.
>

To be clear, the text file is not in the above zip, it is in the associated
"Eletronic inserts" zip file which is part of the published standard.

http://standards.iso.org/ittf/PubliclyAvailableStandards/c068309_ISO_IEC_14651_2016_Electronic_inserts.zip

With this additional zip file you can review the tabular data to make
comparisons and review the patches.

> That file is unfortunately up-to-date only with Unicode 8.0.0,
> but that is already a huge improvement over what we have now.

This doesn't seem correct given the data in Amd.1:2017:
~~~
The current Common Template Table reflects the repertoire of characters of Unicode 9.0, included in
ISO/IEC 10646:2014 plus its Amendments 1 and 2, plus 273 new characters that will be included in the
fifth edition of ISO/IEC 10646.
~~~

> Also, that file contained some errors which needed to be fixed.
> Seems strange for a file release by ISO, but it really contained
> some errors.
> 
> And as the names for most collation symbols have been changed, all the
> collation rules of locales using “copy "iso14651_t1"” needed to be
> updated.
> 
> While doing that, I made the collation rules of all locales I touched
> agree with the CLDR collation rules. glibc has several locales which are
> not in CLDR, for these I just adapted the existing rules.

In summary:

* Can we get clarification of exactly which standard we are update to?
  Is it just ISO/IEC 14651:2016 or ISO/IEC 14651:2016/Amd.1:2017?

-- 
Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]