This is the mail archive of the
libc-locales@sourceware.org
mailing list for the GNU libc locales project.
[Bug localedata/16527] New: strxfrm & strcoll broken with Hangul & en_US.UTF-8
- From: "ju.orth+sourceware at gmail dot com" <sourceware-bugzilla at sourceware dot org>
- To: libc-locales at sourceware dot org
- Date: Tue, 04 Feb 2014 21:13:45 +0000
- Subject: [Bug localedata/16527] New: strxfrm & strcoll broken with Hangul & en_US.UTF-8
- Auto-submitted: auto-generated
https://sourceware.org/bugzilla/show_bug.cgi?id=16527
Bug ID: 16527
Summary: strxfrm & strcoll broken with Hangul & en_US.UTF-8
Product: glibc
Version: 2.18
Status: NEW
Severity: normal
Priority: P2
Component: localedata
Assignee: unassigned at sourceware dot org
Reporter: ju.orth+sourceware at gmail dot com
CC: libc-locales at sourceware dot org
Consider this program:
=============
#include <stdio.h>
#include <locale.h>
#include <string.h>
#include <malloc.h>
void ps(const char *a)
{
size_t s;
unsigned char *b;
int i;
s = strxfrm(NULL, a, 0);
b = malloc(s+1);
strxfrm((void *)b, a, s+1);
for (i = 0; i <= s; i++)
printf("%u ", (unsigned)b[i]);
printf("\n");
}
int main(void)
{
ps("í");
ps("í");
setlocale(LC_COLLATE, "");
ps("í");
ps("í");
}
=============
On systems with LANG=en_US.UTF-8 the output is
=============
237 141 188 0
237 157 144 0
1 1 1 1 194 182 1 194 182 1 194 182 0
1 1 1 1 194 182 1 194 182 1 194 182 0
=============
The output after setlocale(LC_COLLATE, "") is completely nonsensical. Similar
useless output is generated with the locales de_DE.UTF-8, ru_RU.UTF-8, and
jp_JP.UTF-8. ko_KR.UTF-8 seem to be the only working locale.
This can be circumvented by adding the following code to iso14651_t1:
=============
script <HANGUL>
order_start <HANGUL>;forward;forward;forward;forward,position
<UAC00> <UAC00>;IGNORE;IGNORE;IGNORE
.. ..;IGNORE;IGNORE;IGNORE
<UD7A3> <UD7A3>;IGNORE;IGNORE;IGNORE
#
order_end
#
=============
Right below a very similar workaround...
--
You are receiving this mail because:
You are on the CC list for the bug.