This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [Patch v3 6/14] [BZ #14095] update collation data from Unicode / ISO 14651
- From: Carlos O'Donell <carlos at redhat dot com>
- To: Mike FABIAN <mfabian at redhat dot com>, libc-alpha at sourceware dot org
- Cc: "Dmitry V. Levin" <ldv at altlinux dot org>
- Date: Fri, 23 Feb 2018 21:59:44 -0800
- Subject: Re: [Patch v3 6/14] [BZ #14095] update collation data from Unicode / ISO 14651
- Authentication-results: sourceware.org; auth=none
- References: <s9dvaeoc8gt.fsf@taka.site>
On 02/23/2018 02:21 AM, Mike FABIAN wrote:
> From 759aedd5ec485d9f792022e2432262ebaf4f74d8 Mon Sep 17 00:00:00 2001
> From: Mike FABIAN <mfabian@redhat.com>
> Date: Wed, 31 Jan 2018 06:18:47 +0100
> Subject: [PATCH 06/14] iso14651_t1_common: make the fourth level the codepoint
> for characters which are ignorable on all 4 levels
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> Entries for characters which have “IGNORE†on all 4 levels like:
>
> <U0001> IGNORE;IGNORE;IGNORE;IGNORE % START OF HEADING (in ISO 6429)
>
> are changed into:
>
> <U0001> IGNORE;IGNORE;IGNORE;<U0001> % START OF HEADING (in ISO 6429)
>
> i.e. putting the code point of the character into the fourth level
> instead of “IGNOREâ€. Without that change, all such characters
> would compare equal which would make a wcscoll test case fail.
> It is better to have a clearly defined sort order even for characters
> like this so it is good to use the code point as a tie-break.
>
> * localedata/locales/iso14651_t1_common: Use the code point of a character
> in the fourth collation level instead of IGNORE for all entries which
> have IGNORE on all 4 levels.
> ---
> localedata/locales/iso14651_t1_common | 914 +++++++++++++++++-----------------
> 1 file changed, 457 insertions(+), 457 deletions(-)
LGTM.
I agree completely, the code point should be a tie-break, and I'm working the
same thing into the C.UTF-8 locale. I'll get back to that after this work and
hopefully you can't review that work for me :-)
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
--
Cheers,
Carlos.