This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/14094] Update locale data to Unicode 7.0.0


https://sourceware.org/bugzilla/show_bug.cgi?id=14094

--- Comment #26 from Pravin S <pravin.d.s at gmail dot com> ---
(In reply to Mike FABIAN from comment #18)
> (In reply to Pravin S from comment #14)
> > Created attachment 7715 [details]
> > Patch to update UTF-8 CHARMAP and WIDTH to unicode 7.0
> > 
> > Done with all work with UTF-8 file. 
> > Added two script:
> > 1. utf8-gen.py to generate UTF-8 file
> > 2. utf8-compatibility.py : to check backward compatibility of newly
> > generated UTF-8 file
> > 3. Report of new UTF-8 file backward compatibility is available AT
> > https://raw.githubusercontent.com/pravins/glibc-i18n/master/report-utf8
> > 
> > Submitting to glibc-alpha, please help to quick review and push to git.
> 
> I checked the scripts Pravin used and the resulting UTF-8 file.
> 
> I found only one minor problem:
> 
> In some cases, both UnicodeData.txt and EastAsianWidth.txt have information
> about width. For example, EastAsianWidth.txt has:
>     
>     302A..302D;W     # Mn     [4] IDEOGRAPHIC LEVEL TONE MARK..IDEOGRAPHIC
> ENTERING TONE MARK
>     
> which gives us width 2 for these 4 characters (because of âWâ) but
> UnicodeData.txt has:
>     
>     302A;IDEOGRAPHIC LEVEL TONE MARK;Mn;218;NSM;;;;;N;;;;;
>     302B;IDEOGRAPHIC RISING TONE MARK;Mn;228;NSM;;;;;N;;;;;
>     302C;IDEOGRAPHIC DEPARTING TONE MARK;Mn;232;NSM;;;;;N;;;;;
>     302D;IDEOGRAPHIC ENTERING TONE MARK;Mn;222;NSM;;;;;N;;;;;
>     
> which would give width 0 (because of âNSMâ).
> 
> I changed Pravinâs script a bit to prefer the information from
> EastAsianWidth.txt in case of conflicts.
> 
> Pravin has already merged my change into his git repository.

Thanks Mike for review. This bug is presently tracking two changes one with
i18n file and other with UTF-8 file. Both changes are significant so for better
tracking i created new bug
https://sourceware.org/bugzilla/show_bug.cgi?id=17588 for UTF-8 file. I will
submit respective patches there.

i18n ctype is still pending.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]