This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Consistency between strxfrm and strcoll?

From: Carlos O'Donell <carlos at redhat dot com>
To: GNU C Library <libc-alpha at sourceware dot org>
Cc: Mike Fabian <mfabian at redhat dot com>
Date: Thu, 24 Mar 2016 01:35:39 -0400
Subject: Consistency between strxfrm and strcoll?
Authentication-results: sourceware.org; auth=none

POSIX requires that strxfrm and strcoll produce consistent results.

http://pubs.opengroup.org/onlinepubs/9699919799/functions/strxfrm.html
~~~
The transformation is such that if strcmp() is applied to two transformed
strings, it shall return a value greater than, equal to, or less than 0,
corresponding to the result of strcoll() [CX] [Option Start]  or
strcoll_l(), [Option End]  respectively, applied to the same two original
strings [CX] [Option Start]  with the same locale. [Option End]
~~~

However, the program attached to this upstream Red Hat bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1320356

Shows that for some locales, for some randomly generated UTF-8 strings
within the 11-bit 2-byte sequence (U+0080->U+07ff), you get inconsistent
sortings.

Would it beneficial if we made our testing more robust and covered a
broader more deterministic set of tests for sorting?

Our current scripts/sort-test.sh are pretty limited both in the languages
they cover and the character set coverage for sorting.

Then we'd have to determine why strxfrm and strcoll return different answers.
It's not entirely surprising given the algorithmic differences.

Thoughts?

-- 
Cheers,
Carlos.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]