This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] [BZ 17588 13064] Update UTF-8 charmap and width to Unicode 7.0.0


On 02/18/2015 06:23 PM, Alexandre Oliva wrote:
> 	[BZ #17588]
> 	[BZ #13064]
> 	[BZ #14094]
> 	[BZ #17998]
> 	* unicode-gen/Makefile: New.
> 	* unicode-gen/unicode-license.txt: New, from Unicode.
> 	* unicode-gen/UnicodeData.txt: New, from Unicode.
> 	* unicode-gen/DerivedCoreProperties.txt: New, from Unicode.
> 	* unicode-gen/EastAsianWidth.txt: New, from Unicode.
> 	* unicode-gen/gen_unicode_ctype.py: New generator, from Mike
> 	FABIAN <mfabian@redhat.com>.
> 	* unicode-gen/ctype_compatibility.py: New verifier, from
> 	Pravin Satpute <psatpute@redhat.com> and Mike FABIAN.
> 	* unicode-gen/ctype_compatibility_test_cases.py: New verifier
> 	module, from Mike FABIAN.
> 	* unicode-gen/utf8_gen.py: New generator, from Pravin Satpute
> 	and Mike FABIAN.
> 	* unicode-gen/utf8_compatibility.py: New verifier, from Pravin
> 	Satpute and Mike FABIAN.
> 	* charmaps/UTF-8: Update.
> 	* locales/i18n: Update.
> 	* gen-unicode-ctype.c: Remove.
> 	* tst-ctype-de_DE.ISO-8859-1.in: Adjust, islower now returns
> 	true for ordinal indicators.

Looks good to me. Please feel free to commit.

One nit:

-% Character width according to Unicode 5.0.0.
+% Character width according to Unicode 7.0.0.
 % - Default width is 1.
 % - Double-width characters have width 2; generated from
 %        "grep '^[^;]*;[WF]' EastAsianWidth.txt"
-%   and  "grep '^[^;]*;[^WF]' EastAsianWidth.txt"
 % - Non-spacing characters have width 0; generated from PropList.txt or
 %   "grep '^[^;]*;[^;]*;[^;]*;[^;]*;NSM;' UnicodeData.txt"
 % - Format control characters have width 0; generated from
 %   "grep '^[^;]*;[^;]*;Cf;' UnicodeData.txt"
-% - Zero width characters have width 0; generated from
-%   "grep '^[^;]*;ZERO WIDTH ' UnicodeData.txt"

Why even mention the `grep` to be used to generate this data?
It should just say to use the scripts. Nobody should be confused
that this data was actually generated by this method. Nor do I want
anyone doing it this way ever again.

Thus shouldn't `write_header_width` simply not output any of this
stuff? I understand we're trying to minimize the initial diff, but
in cleanup, we should remove all of this and just say:

"% Character width according to Unicode 7.0.0."

Thoughts?

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]