This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Consistency between strxfrm and strcoll?
- From: Carlos O'Donell <carlos at redhat dot com>
- To: GNU C Library <libc-alpha at sourceware dot org>
- Cc: Mike Fabian <mfabian at redhat dot com>
- Date: Thu, 24 Mar 2016 01:35:39 -0400
- Subject: Consistency between strxfrm and strcoll?
- Authentication-results: sourceware.org; auth=none
POSIX requires that strxfrm and strcoll produce consistent results.
http://pubs.opengroup.org/onlinepubs/9699919799/functions/strxfrm.html
~~~
The transformation is such that if strcmp() is applied to two transformed
strings, it shall return a value greater than, equal to, or less than 0,
corresponding to the result of strcoll() [CX] [Option Start] or
strcoll_l(), [Option End] respectively, applied to the same two original
strings [CX] [Option Start] with the same locale. [Option End]
~~~
However, the program attached to this upstream Red Hat bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1320356
Shows that for some locales, for some randomly generated UTF-8 strings
within the 11-bit 2-byte sequence (U+0080->U+07ff), you get inconsistent
sortings.
Would it beneficial if we made our testing more robust and covered a
broader more deterministic set of tests for sorting?
Our current scripts/sort-test.sh are pretty limited both in the languages
they cover and the character set coverage for sorting.
Then we'd have to determine why strxfrm and strcoll return different answers.
It's not entirely surprising given the algorithmic differences.
Thoughts?
--
Cheers,
Carlos.