This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug localedata/14094] Update locale data to Unicode 7.0.0
- From: "maiku.fabian at gmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Wed, 03 Dec 2014 09:59:12 +0000
- Subject: [Bug localedata/14094] Update locale data to Unicode 7.0.0
- Auto-submitted: auto-generated
- References: <bug-14094-131 at http dot sourceware dot org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=14094
--- Comment #34 from Mike FABIAN <maiku.fabian at gmail dot com> ---
When I generate a new glibc/localedata/locales/i18n file
using gen-unicode-ctype.py from comment#33 and build
glibc with that and then run the tests with âmake checkâ, I get
one failure:
FAIL: localedata/tst-ctype
Looking why it fails I find in ./localedata/tst-ctype.out:
Locale-specific tests for `lower'
islower('Â' = '\xaa') is true
islower('Â' = '\xba') is true
Locale-specific tests for `lower'
...
2 errors for `de_DE.ISO-8859-1' locale
The new âlowerâ character class generated by gen-unicode-ctype.py
contains U+00AA Â FEMININE ORDINAL INDICATOR and U+00BA Â MASCULINE
ORDINAL INDICATOR.
The test tst-ctype run by âmake checkâ wants them *not* to be lower case.
DerivedCoreProperties.txt lists both as lower case though:
00AA ; Lowercase # Lo FEMININE ORDINAL INDICATOR
00BA ; Lowercase # Lo MASCULINE ORDINAL INDICATOR
Thatâs why gen-unicode-ctype.py adds them to the âlowerâ character
class, it adds all characters found in DerivedCoreProperties.txt
marked as âLowercaseâ to the character class âlowerâ.
I wonder what needs to be done here.
Is the test in glibc wrong?
If so, it could be fixed by a patch like this:
$ git show | iconv -f iso-8859-1 -t utf-8
commit 25c913674386011a44b6270579a894b2e8200d25
Author: Mike FABIAN <mfabian@redhat.com>
Date: Wed Dec 3 10:05:42 2014 +0100
Fix test case localedata/tst-ctype-de_DE.ISO-8859-1.in
DerivedCoreProperties.txt from Unicode 7.0.0 lists
the characters U+00AA (ÃÂ) and U+00BA (ÃÂ) as lower case:
00AA ; Lowercase # Lo FEMININE ORDINAL INDICATOR
00BA ; Lowercase # Lo MASCULINE ORDINAL INDICATOR
diff --git a/localedata/tst-ctype-de_DE.ISO-8859-1.in
b/localedata/tst-ctype-de_DE.ISO-8859-1.in
index f71d76c..e124a52 100644
--- a/localedata/tst-ctype-de_DE.ISO-8859-1.in
+++ b/localedata/tst-ctype-de_DE.ISO-8859-1.in
@@ -1,5 +1,5 @@
lower ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ
- 000000000000000000000100000000000000000000000000
+ 000000000010000000000100001000000000000000000000
lower ÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ
000000000000000111111111111111111111111011111111
upper ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ
--
You are receiving this mail because:
You are on the CC list for the bug.