This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/14094] Update locale data to Unicode 7.0.0


https://sourceware.org/bugzilla/show_bug.cgi?id=14094

--- Comment #34 from Mike FABIAN <maiku.fabian at gmail dot com> ---
When I generate a new glibc/localedata/locales/i18n file
using gen-unicode-ctype.py from comment#33 and build
glibc with that and then run the tests with âmake checkâ, I get
one failure:

    FAIL: localedata/tst-ctype

Looking why it fails I find in ./localedata/tst-ctype.out:

    Locale-specific tests for `lower'
      islower('Â' = '\xaa') is true
      islower('Â' = '\xba') is true
    Locale-specific tests for `lower'
    ...
    2 errors for `de_DE.ISO-8859-1' locale

The new âlowerâ character class generated by gen-unicode-ctype.py
contains U+00AA Â FEMININE ORDINAL INDICATOR and U+00BA Â MASCULINE
ORDINAL INDICATOR.

The test tst-ctype run by âmake checkâ wants them *not* to be lower case.

DerivedCoreProperties.txt lists both as lower case though:

    00AA          ; Lowercase # Lo       FEMININE ORDINAL INDICATOR
    00BA          ; Lowercase # Lo       MASCULINE ORDINAL INDICATOR

Thatâs why gen-unicode-ctype.py adds them to the âlowerâ character
class, it adds all characters found in DerivedCoreProperties.txt
marked as âLowercaseâ to the character class âlowerâ.

I wonder what needs to be done here.

Is the test in glibc wrong?

If so, it could be fixed by a patch like this:

$ git show | iconv -f iso-8859-1 -t utf-8
commit 25c913674386011a44b6270579a894b2e8200d25
Author: Mike FABIAN <mfabian@redhat.com>
Date:   Wed Dec 3 10:05:42 2014 +0100

    Fix test case localedata/tst-ctype-de_DE.ISO-8859-1.in

    DerivedCoreProperties.txt from Unicode 7.0.0 lists
    the characters U+00AA (ÃÂ) and U+00BA (ÃÂ) as lower case:

    00AA          ; Lowercase # Lo       FEMININE ORDINAL INDICATOR
    00BA          ; Lowercase # Lo       MASCULINE ORDINAL INDICATOR

diff --git a/localedata/tst-ctype-de_DE.ISO-8859-1.in
b/localedata/tst-ctype-de_DE.ISO-8859-1.in
index f71d76c..e124a52 100644
--- a/localedata/tst-ctype-de_DE.ISO-8859-1.in
+++ b/localedata/tst-ctype-de_DE.ISO-8859-1.in
@@ -1,5 +1,5 @@
 lower   ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ
-        000000000000000000000100000000000000000000000000
+        000000000010000000000100001000000000000000000000
 lower   ÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ
         000000000000000111111111111111111111111011111111
 upper   ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]