This is the mail archive of the
libc-locales@sourceware.org
mailing list for the GNU libc locales project.
[Bug localedata/19922] New: [PATCH] iso14651_t1_common: Define collation for Malayalam chillu characters
- From: "santhosh.thottingal at gmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: libc-locales at sourceware dot org
- Date: Fri, 08 Apr 2016 00:01:03 +0000
- Subject: [Bug localedata/19922] New: [PATCH] iso14651_t1_common: Define collation for Malayalam chillu characters
- Auto-submitted: auto-generated
https://sourceware.org/bugzilla/show_bug.cgi?id=19922
Bug ID: 19922
Summary: [PATCH] iso14651_t1_common: Define collation for
Malayalam chillu characters
Product: glibc
Version: 2.25
Status: NEW
Severity: normal
Priority: P2
Component: localedata
Assignee: unassigned at sourceware dot org
Reporter: santhosh.thottingal at gmail dot com
CC: libc-locales at sourceware dot org
Target Milestone: ---
Created attachment 9164
--> https://sourceware.org/bugzilla/attachment.cgi?id=9164&action=edit
iso14651_t1_common: define collation for Malayalam chillu characters
Malayalam Chillu characters, that were added in Unicode 5.1 is not considered
in the collation rules for Malayalam. These 6 characters are
U+07DA to U+07DF
Unicode defines them as alternate representation of ZWJ based Chillus
(Consonant+Virama+ZWJ). ZWJ based chillus are represented in the collation
rules already.
So U+07DA to U+07DF should have primary collation weight equal to the ZWJ
based Chillus. Note that ZWJ has 0 collation weight(ignorable in collation).
So:
U+07DA(à) and U+0D23(à)+ U+0D4D(à) have same primary weight and differs in
secondary level weight.
Unicode CLDR collation also follows exactly same logic. See
http://unicode.org/cldr/trac/browser/trunk/common/collation/ml.xml
[...]
# Pre-5.1 Chillus secondary equal to 5.1 chillus.
# Chillus primary equal to their consonant_dead form.
&àà<<àà\u200D<<<à
&àà<<àà\u200D<<<à
&àà<<àà\u200D<<<à
&àà<<àà\u200D<<<à
&àà<<àà\u200D<<<à
&àà<<àà\u200D<<<à
[...]
The attached patch implements this.
To test, have a text file with following content:
ààâ
àà
à
$ LANG=ml_IN.UTF-8 sort ~/sort.txt
àà
ààâ
à
The same input can be tested with
http://demo.icu-project.org/icu-bin/collation.html and verify the output is
same as the above output.
Explanation of output:
1. à\u0D4D - This is à + à
2. à\u0D4D\u200D - This is à + à + ZWJ - ZWJ based chillu. Sorts after the ZWJ
less dead form of à.
3. à - This is atomic chillu à U+07DA - with secondary level collation weight
differing from above ZWJ based chillu.
--
You are receiving this mail because:
You are on the CC list for the bug.