This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/17750] wrong collation order of diacritics in most locales


https://sourceware.org/bugzilla/show_bug.cgi?id=17750

--- Comment #2 from keld at keldix dot com <keld at keldix dot com> ---
On Tue, Dec 23, 2014 at 04:25:27AM +0000, aoliva at sourceware dot org wrote:
> https://sourceware.org/bugzilla/show_bug.cgi?id=17750
> 
>             Bug ID: 17750
>            Summary: wrong collation order of diacritics in most locales
>            Product: glibc
>            Version: unspecified
>             Status: NEW
>           Severity: normal
>           Priority: P2
>          Component: localedata
>           Assignee: unassigned at sourceware dot org
>           Reporter: aoliva at sourceware dot org
>                 CC: libc-locales at sourceware dot org
> 
> http://www.unicode.org/reports/tr10/tr10-30.html states:
> 
> <quote>
> Normally, all differences in sorting are assessed from the start to the end of
> the string. If all of the base letters are the same, the first accent
> difference determines the final order. In row 1 of Table 5, the first accent
> difference is on the o, so that is what determines the order. In some French
> dictionary ordering traditions, however, it is the last accent difference that
> determines the order, as shown in row 2.
> </quote>
> 
> Table 5 says:
> 
> <pre>
> Normal Accent Ordering      cote < cotà < cÃte < cÃtÃ
> Backward Accent Ordering     cote < cÃte < cotà < cÃtÃ
> </pre>
> 
> However, glibc implements backward accent ordering for all locales except de_DE
> and lb_LU.  
> 
> Unicode CLDR 26 confirms this is wrong: the only file in
> http://unicode.org/cldr/trac/browser/tags/release-26/common/collation/ that has
> settings backwards="on" is fr_CA.xml.

This was probably done because if there are more than one accented letter in a
string,
the word or name is probably French, and then the french rules should be
followed.
This would mean that CLDR is wrong.

Best regards
Keld

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]