This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/21547] Tibetan script collation broken (Dzongkha and Tibetan)


https://sourceware.org/bugzilla/show_bug.cgi?id=21547

--- Comment #18 from Elie Roux <elie.roux@telecom-bretagne.eu> ---
well, things are supposed to be sorted just like in the sorted list attached to
this bug report.

Now, I agree there is some magic going on here, and it's not totally obvious to
me how this works, but it works.

Even though it's not clear what line 7 does, it clearly does something, because
if you remove it, the tests on the tibetan-collation github repo fails with:

expected [གངས་ལྷགས།, གཉྫིར།, གད།]
got      [གངས་ལྷགས།, གད།, གཉྫིར།]

The test corresponds to page 347 of the tshig mdzod chen mo:

https://www.tbrc.org/browser/ImageService?work=W29329&igroup=I1KG15042&image=379&first=1&last=1058&fetchimg=yes

So line 7 has a purpose and doesn't get completely overwritten, although I
agree the magic that takes place is a bit above my head... I suppose that
somehow it indicates that གཉྫ should be sorted after the initial value of གཉ,
and this get recorded somehow, even though གཉ then takes another value
afterwards.

I guess it may become less confusing with a bit of an understanding of Tibetan:
in Tibetan གཉ absolutely never exists on its own, as it would be main letter ག
then suffix ཉ and this can simply never happens (ཉ cannot be a suffix). What
may happen are two cases starting with གཉ:

1. གཉྫིར is transliterated Sanskrit, and sort of exceptionally (and quite
erratically) behaves as if ཉ was a suffix, and is thus sorted with the main
letter ག, and that's what line 7 is trying to sort. What I believe happens is
that at the time line 7 is parsed, གཉ is still sorted with the main letter ག,
as it would be in the root collation. So this sorts གཉྫ with the main letter ག.
Note that if you put the rule at the end of the file, the result is not the
same, so I think it's more or less what's happening...

2. གཉར is prefix ག, then main letter ཉ then suffix ར, which is sorted in a
totally different way, with the main letter ཉ, as stated by the rule of line
30, far after main letter ག. So this sorts གཉ with the main letter ཉ.

That's my understanding of the situation, and I still think the rules are
correct... I'm not sure I have made things clearer, if you want more details
don't hesitate to ask!

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]