This is the mail archive of the libc-locales@sourceware.org mailing list for the GNU libc locales project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug localedata/16527] New: strxfrm & strcoll broken with Hangul & en_US.UTF-8


https://sourceware.org/bugzilla/show_bug.cgi?id=16527

            Bug ID: 16527
           Summary: strxfrm & strcoll broken with Hangul & en_US.UTF-8
           Product: glibc
           Version: 2.18
            Status: NEW
          Severity: normal
          Priority: P2
         Component: localedata
          Assignee: unassigned at sourceware dot org
          Reporter: ju.orth+sourceware at gmail dot com
                CC: libc-locales at sourceware dot org

Consider this program:

=============
#include <stdio.h>
#include <locale.h>
#include <string.h>
#include <malloc.h>

void ps(const char *a)
{
    size_t s;
    unsigned char *b;
    int i;

    s = strxfrm(NULL, a, 0);
    b = malloc(s+1);
    strxfrm((void *)b, a, s+1);
    for (i = 0; i <= s; i++)
        printf("%u ", (unsigned)b[i]);
    printf("\n");
}

int main(void)
{
    ps("í");
    ps("í");

    setlocale(LC_COLLATE, "");

    ps("í");
    ps("í");
}
=============

On systems with LANG=en_US.UTF-8 the output is

=============
237 141 188 0 
237 157 144 0 
1 1 1 1 194 182 1 194 182 1 194 182 0 
1 1 1 1 194 182 1 194 182 1 194 182 0 
=============

The output after setlocale(LC_COLLATE, "") is completely nonsensical. Similar
useless output is generated with the locales de_DE.UTF-8, ru_RU.UTF-8, and
jp_JP.UTF-8. ko_KR.UTF-8 seem to be the only working locale.

This can be circumvented by adding the following code to iso14651_t1:

=============
script <HANGUL>

order_start <HANGUL>;forward;forward;forward;forward,position
<UAC00> <UAC00>;IGNORE;IGNORE;IGNORE
.. ..;IGNORE;IGNORE;IGNORE
<UD7A3> <UD7A3>;IGNORE;IGNORE;IGNORE
#
order_end
#
=============

Right below a very similar workaround...

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]