This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug locale/22898] Some Chinese characters cannot be sorted by adding sorting rules to LC_COLLATE


https://sourceware.org/bugzilla/show_bug.cgi?id=22898

--- Comment #1 from Mike FABIAN <maiku.fabian at gmail dot com> ---
    diff --git a/localedata/en_GB.UTF-8.in b/localedata/en_GB.UTF-8.in
    new file mode 100644
    index 0000000000..b365767bac
    --- /dev/null
    +++ b/localedata/en_GB.UTF-8.in
    @@ -0,0 +1,10 @@
    +a
    +A
    +ĉ
    +Ĉ
    +𠮞 ; <U00020B9E>
    +𫡅 ; <U0002B845>

So the test file expects U+2B845 to be sorted at this position.

    +b
    +B
    +c
    +C
    diff --git a/localedata/locales/en_GB b/localedata/locales/en_GB
    index 5b895574ac..e114a3a440 100644
    --- a/localedata/locales/en_GB
    +++ b/localedata/locales/en_GB
    @@ -60,6 +60,19 @@ END LC_CTYPE
     LC_COLLATE
     % Copy the template from ISO/IEC 14651
     copy "iso14651_t1"
    +
    +collating-symbol <ccirc>
    +
    +reorder-after <AFTER-A>
    +<ccirc>
    +
    +<U0108> <ccirc>;<BASE>;<CAP>;<U0108>
    +<U0109> <ccirc>;<BASE>;<MIN>;<U0109>
    +<U00020B9E> <ccirc>;<BASE>;<CAP>;<U00020B9E>
    +<U0002B845> <ccirc>;<BASE>;<CAP>;<U0002B845>

Here we have a rule to sort U+2B845 like the collation symbol <ccirc> which is
reordered
after the Latin letter a.

    +
    +reorder-end
    +
     END LC_COLLATE

     LC_MONETARY

But when running "make check" one gets:

    $ grep ^FAIL tests.sum 
    FAIL: localedata/sort-test

And the test output contains:



en_GB.UTF-8 collate-test FAIL
  --- en_GB.UTF-8.in    2018-02-26 10:53:50.810558237 +0100
  +++ /local/mfabian/src/glibc-build/localedata/en_GB.UTF-8.out 2018-02-26
13:36:16.922398151 +0100
  @@ -1,9 +1,9 @@
  +𫡅 ; <U0002B845>
   a
   A
   ĉ
   Ĉ
   𠮞 ; <U00020B9E>
  -𫡅 ; <U0002B845>
   b
   B
   c

So U+20B9E is sorted as expected but U+2B845 is not. U+2B845 is sorted as if
there
were not rules at all for this character. Therefore, it ends up before a.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]