This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] cs_CZ locale: fix collation [BZ #22336]
- From: Carlos O'Donell <carlos at redhat dot com>
- To: Mike FABIAN <mfabian at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Mon, 27 Nov 2017 10:47:48 -0800
- Subject: Re: [PATCH] cs_CZ locale: fix collation [BZ #22336]
- Authentication-results: sourceware.org; auth=none
- References: <s9dy3mvrj8c.fsf@taka.site>
On 11/24/2017 03:47 AM, Mike FABIAN wrote:
>
> [BZ #22336]
> * localedata/locales/cs_CZ (LC_COLLATE): Use “copy "iso14651_t1"”
> and implement the collation rules for cs from CLDR on top of that.
> * Makefile: Add cs_CZ.UTF-8 to test-input and to the list
> of locales to be built for testing.
> * cs_CZ.UTF-8.in: New file with test data to test the Czech sorting.
>
> Difference in sorting of the added test file (added in the patch)
> before and after applying the sorting changes of the patch:
>
> $ diff -u cs_CZ.UTF-8.in.old cs_CZ.UTF-8.in
> --- cs_CZ.UTF-8.in.old 2017-11-24 16:47:52.688348458 +0530
> +++ cs_CZ.UTF-8.in 2017-11-24 16:42:08.364221944 +0530
> @@ -1,7 +1,3 @@
> -ȥ
> -Ȥ
> -ʒ
> -Ʒ
> a
> a
> a
> @@ -65,7 +61,6 @@
> cenných
> cenným
> cenou
> -cH
> cvrček
> cz
> cZ
> @@ -94,8 +89,9 @@
> H
> hruška
> ch
> -CH
> +cH
> Ch
> +CH
> chřestýšům
> Chřestýšům
> chřipka
> @@ -188,6 +184,8 @@
> Z
> ź
> Ź
> +ȥ
> +Ȥ
> za
> Za
> źa
> @@ -209,6 +207,8 @@
> Žb
> žluva
> Žluva
> +ʒ
> +Ʒ
> 0
> 1
> 1
>
> I think "cH" was sorted completely wrong before and "CH" slightly
> wrong as well. So this patch seems to not only base the Czech
> LC_COLLATE implementation on the iso14651_t1 file as requested in this
> bug but also improves the sorting of the uppercase/lowercase variants
> of the ch digraph.
>
> And of course it improves the sorting of some non-Czech characters
> like ʒ and ȥ because these were not handled at all in the old
> Czech LC_COLLATE implementation.
>
Looks good to me, and yes, it fixes what looks like wrong sorting in the
CH digraph.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
--
Cheers,
Carlos.