This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/19922] New: [PATCH] iso14651_t1_common: Define collation for Malayalam chillu characters


https://sourceware.org/bugzilla/show_bug.cgi?id=19922

            Bug ID: 19922
           Summary: [PATCH] iso14651_t1_common: Define collation for
                    Malayalam chillu characters
           Product: glibc
           Version: 2.25
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: santhosh.thottingal at gmail dot com
                CC: libc-locales at sourceware dot org
  Target Milestone: ---

Created attachment 9164
  --> https://sourceware.org/bugzilla/attachment.cgi?id=9164&action=edit
iso14651_t1_common: define collation for Malayalam chillu characters

Malayalam Chillu characters, that were added in Unicode 5.1 is not considered
in the collation rules for Malayalam. These 6 characters are 
U+07DA  to U+07DF

Unicode defines them as alternate representation of ZWJ based Chillus
(Consonant+Virama+ZWJ). ZWJ based chillus are represented in the collation
rules already.

So U+07DA  to U+07DF should have primary collation weight equal to the ZWJ
based Chillus. Note that ZWJ has 0 collation weight(ignorable in collation).
So:

U+07DA(à) and U+0D23(à)+ U+0D4D(à) have same primary weight and differs in
secondary level weight.

Unicode CLDR collation also follows exactly same logic. See
http://unicode.org/cldr/trac/browser/trunk/common/collation/ml.xml

 [...]
 #  Pre-5.1 Chillus secondary equal to 5.1 chillus.
 #  Chillus primary equal to their consonant_dead form.
 &àà<<àà\u200D<<<à
 &àà<<àà\u200D<<<à
 &àà<<àà\u200D<<<à
 &àà<<àà\u200D<<<à
 &àà<<àà\u200D<<<à
 &àà<<àà\u200D<<<à
 [...]


The attached patch implements this.

To test, have a text file with following content:
ààâ
àà
à

$ LANG=ml_IN.UTF-8 sort ~/sort.txt
àà
ààâ
à

The same input can be tested with
http://demo.icu-project.org/icu-bin/collation.html and verify the output is
same as the above output.

Explanation of output:

1. à\u0D4D - This is à + à 
2. à\u0D4D\u200D - This is à + à + ZWJ - ZWJ based chillu. Sorts after the ZWJ
less dead form of à.
3. à - This is atomic chillu à U+07DA - with secondary level collation weight
differing from above ZWJ based chillu.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]