This is the mail archive of the
libc-alpha@cygnus.com
mailing list for the glibc project.
Re: Crash (and crude workaround) for glibc2.x wcsxfrm()
- To: Geoff Keating <geoffk@ozemail.com.au>
- Subject: Re: Crash (and crude workaround) for glibc2.x wcsxfrm()
- From: neideck@qkal.sap-ag.de
- Date: Thu, 22 Apr 1999 12:22:23 +0200
- Cc: libc-alpha@cygnus.com
> > From: neideck@qkal.sap-ag.de
> > Feeding wchar_t strings with characters outside the range 0-255 into
> > the wcsxfrm() function when under any of the 8-bit locales (such as de_DE)
> > leads to a crash. This works on other operating systems such as HP/UX and
> > Digital Unix (and basically these characters are ignored).
>
> By 'ignored', you mean deleted from the string, or passed through unchanged?
It depends. HP/UX 10.20 just refuses to map the strings and returns -1 (i.e.
mapping error). Digital Unix V4.0D implements something that resembles
my fix, i.e. it clears out the upper bits of the values. On retrospect,
the HP/UX behaviour is more reasonable and the Digital Unix manpage itself
suggests that it also shouldn't do what it does. Quoting the Digital Unix
man page:
"On error, the wcsxfrm() function returns (size_t)-1 and sets errno to indi-
cate the error.
ERRORS
If any the following conditions occur, the wcsxfrm() function sets errno to
the corresponding value:
[EINVAL] The ws2 parameter contains wide-character codes outside the
domain of the collating sequence defined by the current locale.
"
> > <fix to suppress upper bits deleted>
> Certainly this is wrong. Probably the right thing to do is pass the
> character through unchanged, because wcscoll should treat such
> characters like wcscmp does.
Passing the character unchanged leads to a crash in get_weight. Both
wcsxfrm() and wcscoll() should check for characters outside the input
range. wcsxfrm() then does no translation and returns -1, wcscoll does
whatever it wants and sets "errno" to EINVAL (since it cannot return
an error code).
The thing I couldn't figure out up until now is, where I find the valid
range of characters for the current locale (it seems to be hardcoded at
8 bits).
I've attached a program to demonstrate the problem this time.
Burkhard Neidecker-Lutz
CEC Karlsruhe , SAP AG, neideck@qkal.sap-ag.de
#include <stdio.h>
#include <locale.h>
#include <wchar.h>
#define BUF_SIZE 128
#define MAXCHAR 16
#include <stdlib.h>
#include <stdio.h>
main(int argc,char *argv[])
{
unsigned char instring[BUF_SIZE];
wchar_t s1[BUF_SIZE + 1];
wchar_t s2[BUF_SIZE + 1];
int i,n,len,ch;
setlocale( LC_COLLATE, "" );
for (ch = 0; ch < 2*256 ; ch++) {
len = 1;
s1[0] = ch;
s1[len]=0;
n = wcsxfrm(s2, s1, BUF_SIZE);
printf("%03d %03d ",ch,n);
for (i = 0 ; i < n; i++) {
printf("%d ",s2[i]);
}
putchar('\n');
fflush(stdout);
}
}