This is the mail archive of the libc-alpha@cygnus.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: Crash (and crude workaround) for glibc2.x wcsxfrm()


> > From: neideck@qkal.sap-ag.de
> > Feeding wchar_t strings with characters outside the range 0-255 into
> > the wcsxfrm() function when under any of the 8-bit locales (such as de_DE)
> > leads to a crash. This works on other operating systems such as HP/UX and
> > Digital Unix (and basically these characters are ignored).
> 
> By 'ignored', you mean deleted from the string, or passed through unchanged?

It depends. HP/UX 10.20 just refuses to map the strings and returns -1 (i.e.
mapping error). Digital Unix V4.0D implements something that resembles
my fix, i.e. it clears out the upper bits of the values. On retrospect,
the HP/UX behaviour is more reasonable and the Digital Unix manpage itself
suggests that it also shouldn't do what it does. Quoting the Digital Unix
man page:

"On error, the wcsxfrm() function returns (size_t)-1 and sets errno to indi-
 cate the error.

ERRORS

  If any the following conditions occur, the wcsxfrm() function sets errno to
  the corresponding value:

  [EINVAL]  The ws2 parameter contains wide-character codes outside the
            domain of the collating sequence defined by the current locale.
"
> > <fix to suppress upper bits deleted>

> Certainly this is wrong.  Probably the right thing to do is pass the
> character through unchanged, because wcscoll should treat such
> characters like wcscmp does.

Passing the character unchanged leads to a crash in get_weight. Both
wcsxfrm() and wcscoll() should check for characters outside the input
range. wcsxfrm() then does no translation and returns -1, wcscoll does
whatever it wants and sets "errno" to EINVAL (since it cannot return
an error code). 

The thing I couldn't figure out up until now is, where I find the valid
range of characters for the current locale (it seems to be hardcoded at
8 bits).

I've attached a program to demonstrate the problem this time.

               Burkhard Neidecker-Lutz

CEC Karlsruhe , SAP AG, neideck@qkal.sap-ag.de

#include <stdio.h>
#include <locale.h>
#include <wchar.h>

#define BUF_SIZE 128
#define MAXCHAR   16

#include <stdlib.h>
#include <stdio.h>

main(int argc,char *argv[])
{
  unsigned char instring[BUF_SIZE];
  wchar_t s1[BUF_SIZE + 1];
  wchar_t s2[BUF_SIZE + 1];
  int i,n,len,ch;

  setlocale( LC_COLLATE, "" );
  for (ch = 0; ch < 2*256 ; ch++) {
    len = 1;
    s1[0] = ch;
    s1[len]=0;
    n = wcsxfrm(s2, s1, BUF_SIZE);
    printf("%03d %03d ",ch,n);
    for (i = 0 ; i < n; i++) {
      printf("%d ",s2[i]);
    }
    putchar('\n');
    fflush(stdout);
  }
}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]