This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Patch 0/13] [BZ #14095] update collation data from Unicode / ISO 14651


On 01/29/2018 08:03 AM, Joseph Myers wrote:
> On Sat, 27 Jan 2018, Carlos O'Donell wrote:
> 
>>> I’ll try again with ISO14651_2016_TABLE1_en.txt now, that
>>> seems to be the latest version.
>>
>> OK, good! The 2016_TABLE1 seems to be for the Amd.1:2017 which matches
>> what I would expect and lines up with Unicode 9.
>>
>> In which case we are only 1 unicode revision behind.
> 
> Since the tables are apparently generated in an automated way from the 
> Unicode data (according to the comments on them), is the source for that 
> automation available somewhere so we could use it and work from the latest 
> Unicode data directly?
 
The source of this automation is not publicly available. We would have to track
down those working on the standard and work with them to get the scripts.

Even if we had the latest set of scripts we could not use it to process the latest
Unicode data directly because it would not match any published version of the
ISO 14651 standard.

We could however have used the scripts to process Unicode 9 data to simplify our
own processes. Thus we would convert from raw Unicode data to our own internal
formats rather than through any indirect means via ISO 14651. Then all we would
need is a further verification pass to ensure that the published ISO 14651 matches
what we generated from the Unicode data.

So in summary:

Today we have:

* Automated glibc process to convert Unicode data into Unicode-based locale data.
* Manual glibc process to convert IS 14651 data into locale data.

In the future it would be nice to have:

* Get automation scripts from ISO 14651 group to process Unicode data into ISO 14651
  format data.
* Unify glibc process to turn Unicode data (at two possible revisions) into our normal
  Unicode-based locale data, and our ISO 14651-based locale data.
* Add a verification pass to ensure the published ISO 14651 data table matches what we
  generated for our ISO 14651-based locale data.

Does that make sense?

-- 
Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]