This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] [BZ 17588 13064] Update UTF-8 charmap and width to Unicode 7.0.0
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: Alexandre Oliva <aoliva at redhat dot com>
- Cc: Pravin Satpute <psatpute at redhat dot com>, Siddhesh Poyarekar <siddhesh at redhat dot com>, Mike FABIAN <mfabian at redhat dot com>, libc-alpha at sourceware dot org, Jens Petersen <petersen at redhat dot com>
- Date: Fri, 20 Feb 2015 13:57:37 -0500
- Subject: Re: [PATCH] [BZ 17588 13064] Update UTF-8 charmap and width to Unicode 7.0.0
- Authentication-results: sourceware.org; auth=none
- References: <573624784 dot 8871393 dot 1416848051220 dot JavaMail dot zimbra at redhat dot com> <orzjb3o7yf dot fsf at free dot home> <s9dy4qir6fu dot fsf at ari dot site> <orfvce7y90 dot fsf at free dot home> <s9d388duu5r dot fsf at ari dot site> <orioh35mbq dot fsf at free dot home> <20141223111038 dot GA5172 at spoyarek dot pnq dot redhat dot com> <119234933 dot 5523688 dot 1422972847328 dot JavaMail dot zimbra at redhat dot com> <or7fvnlbeo dot fsf at livre dot home> <orwq3njuvc dot fsf at livre dot home> <54E23EC9 dot 5020400 at redhat dot com> <ortwyig5xa dot fsf at livre dot home>
On 02/18/2015 06:23 PM, Alexandre Oliva wrote:
> [BZ #17588]
> [BZ #13064]
> [BZ #14094]
> [BZ #17998]
> * unicode-gen/Makefile: New.
> * unicode-gen/unicode-license.txt: New, from Unicode.
> * unicode-gen/UnicodeData.txt: New, from Unicode.
> * unicode-gen/DerivedCoreProperties.txt: New, from Unicode.
> * unicode-gen/EastAsianWidth.txt: New, from Unicode.
> * unicode-gen/gen_unicode_ctype.py: New generator, from Mike
> FABIAN <mfabian@redhat.com>.
> * unicode-gen/ctype_compatibility.py: New verifier, from
> Pravin Satpute <psatpute@redhat.com> and Mike FABIAN.
> * unicode-gen/ctype_compatibility_test_cases.py: New verifier
> module, from Mike FABIAN.
> * unicode-gen/utf8_gen.py: New generator, from Pravin Satpute
> and Mike FABIAN.
> * unicode-gen/utf8_compatibility.py: New verifier, from Pravin
> Satpute and Mike FABIAN.
> * charmaps/UTF-8: Update.
> * locales/i18n: Update.
> * gen-unicode-ctype.c: Remove.
> * tst-ctype-de_DE.ISO-8859-1.in: Adjust, islower now returns
> true for ordinal indicators.
Looks good to me. Please feel free to commit.
One nit:
-% Character width according to Unicode 5.0.0.
+% Character width according to Unicode 7.0.0.
% - Default width is 1.
% - Double-width characters have width 2; generated from
% "grep '^[^;]*;[WF]' EastAsianWidth.txt"
-% and "grep '^[^;]*;[^WF]' EastAsianWidth.txt"
% - Non-spacing characters have width 0; generated from PropList.txt or
% "grep '^[^;]*;[^;]*;[^;]*;[^;]*;NSM;' UnicodeData.txt"
% - Format control characters have width 0; generated from
% "grep '^[^;]*;[^;]*;Cf;' UnicodeData.txt"
-% - Zero width characters have width 0; generated from
-% "grep '^[^;]*;ZERO WIDTH ' UnicodeData.txt"
Why even mention the `grep` to be used to generate this data?
It should just say to use the scripts. Nobody should be confused
that this data was actually generated by this method. Nor do I want
anyone doing it this way ever again.
Thus shouldn't `write_header_width` simply not output any of this
stuff? I understand we're trying to minimize the initial diff, but
in cleanup, we should remove all of this and just say:
"% Character width according to Unicode 7.0.0."
Thoughts?
Cheers,
Carlos.