This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] [BZ 14094] Update locale data to Unicode 7.0.0

From: Pravin Satpute <psatpute at redhat dot com>
To: "Joseph S. Myers" <joseph at codesourcery dot com>
Cc: libc-alpha at sourceware dot org, "Carlos O'Donell" <carlos at redhat dot com>
Date: Mon, 23 Jun 2014 04:54:36 -0400 (EDT)
Subject: Re: [PATCH] [BZ 14094] Update locale data to Unicode 7.0.0
Authentication-results: sourceware.org; auth=none
References: <53A5DCA3 dot 4010108 at redhat dot com> <Pine dot LNX dot 4 dot 64 dot 1406212027560 dot 29257 at digraph dot polyomino dot org dot uk>


>----- Original Message -----
>From: "Joseph S. Myers" <joseph@codesourcery.com>
>To: "Pravin Satpute" <psatpute@redhat.com>
>Cc: libc-alpha@sourceware.org, "Carlos O'Donell" <carlos@redhat.com>
>Sent: Sunday, June 22, 2014 2:34:30 AM
>Subject: Re: [PATCH] [BZ 14094] Update locale data to Unicode 7.0.0


>>  A.  Process for updating locales/i18n ctype with new Unicode release is
>> documented @ [1], I think it should get added either in WIKI, or docs
>> folder of glibc.

>The process should ideally be running a single command - no manual editing 
>at all.  (That command might be a script that wraps some other commands.)  
>If tempted to write instructions for running a sequence of commands and 
>editing the result, writing a script to automate that is better.

Agree. I will improve it with some more automation. 
Six characters which i have added manually, i am still not sure why those are present in i18n and from where those came.
I will do some more analysis on same and see if we can simply get rid of those.


>>      Report/Analysis for backward compatibility is available AT
>> backward-compatibility5_1-to-7_0 [3]

>That report is a very useful starting point, but doesn't seem to explain 
>things at the human level.  What changes have there been to previously 
>supported characters, and why, in terms of Unicode character properties, 
>are those changes correct changes?  Maybe something more verbose that 
>names the characters individually and states what the old ctype 
>information was, and what the new information is, and what the relevant 
>Unicode proeprties are that explain the new information, would help.

This report is analysis done by me on check-backcompatibility.py report. 
Yes, i can provide more information there. I am sure next update to ctype
will not require that's long analysis :)


>You're changing how upper/lower/alpha properties are generated.  Does that 
>fix bug 14010?  If so, you can include [BZ #14010] in your ChangeLog 
>entry.  

Yes, its does fixes 14010 issues as well. Will add this.

>Does it obsolete the special cases in 
>gen-unicode-ctype.c:is_alpha?  If so, you should remove the parts of 
>gen-unicode-ctype.c that are no longer used.  You should also confirm that 
>each of the special cases there is properly handled by the new logic - or 
>state explicitly that the handling of certain identified characters with 
>special cases is being deliberately changed, because the Unicode 
>properties for those characters are better than the special-case handling.

Yes. DerivedCoreProperties.txt better handling special cases. Its Alphabetic derived from 
"Uppercase + Lowercase + Lt + Lm + Lo + Nl + Other_Alphabetic"

Sure. Will modify gen-unicode-ctype.c to not generate classes for alpha, upper and lower.



>> -#define __STDC_ISO_10646__		201103L
>> +   Unicode 6.0.
>> +   Unicode 7.0.0 Published on 2014 June 16   */
>> +#define __STDC_ISO_10646__		201406L


>Now, the most recent published amendment is amendment 1 from 2013-04-15 
>(Linear A, Palmyrene, Manichaean, Khojki, Khudawadi, Bassa Vah, Duployan, 
>and other characters).  WG2 N4566 states an intent for Unicode 7.0 to 
>synchronize with amendment 2 to the 2012 edition of ISO/IEC 10646.  
>However, I can't locate a proposed publication date for that amendment (or 
>for the 2014 edition of ISO/IEC 10646 - and work appears to be underway on 
>amendments 1 and 2 to the 2014 edition, even before it's published).  So 
>maybe put 201304L there until such an amendment is published.

Thank you for this. I was not getting proper date.

>> diff --git a/scripts/check-backcompatibility.py b/scripts/check-backcompatibility.py
>> +++ b/scripts/check-backcompatibility.py

>I think in scripts/ the name should be more specific about *what* is 
>having compatibility checked - scripts/ is for all of glibc, not just 
>locale data.

Might be ctype-backcompatibility.py will be good.

>> +# Copyright (C) 2013-14, Pravin Satpute <psatpute@redhat.com>

>glibc contributions should be assigned to the FSF (and miscellaneous 
>programs would normally by GPLv2+ / LGPLv2.1+ unless there is some reason 
>to deviate from the norm for such programs in glibc).

Will update this. 

Thanks you for analysis. I will soon submit improved patch. :)

Regards,
Pravin Satpute

References:
- [PATCH] [BZ 14094] Update locale data to Unicode 7.0.0
  - From: Pravin Satpute
- Re: [PATCH] [BZ 14094] Update locale data to Unicode 7.0.0
  - From: Joseph S. Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]