This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: regression caused by fix of bug #13691

From: Bruno Haible <bruno at clisp dot org>
To: Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>
Cc: libc-alpha at sourceware dot org
Date: Mon, 14 May 2012 04:31:31 +0200
Subject: Re: regression caused by fix of bug #13691
Bcc: bruno at haible dot de
References: <1995140.sSugJaaxUI@linuix>

I wrote:
> the most important
> place to fix is the mbrtowc() behaviour. But this is also the most
> difficult one. I cannot see how to make the following requirements
> coexist:
> 
>   * mbrtowc(&wc, "A", 1, &ps) shall set wc = L'A'.
> 
>   * mbrtowc(&wc, "A\xb0", 2, &ps) shall set wc = 0x00C0
>     (LATIN CAPITAL LETTER A WITH GRAVE)
> 
>   * mbrtowc can be used to process a string byte for byte; it returns -2
>     when a byte sequence is incomplete. In particular this means that the
>     sequence of calls
>       mbrtowc(&wc, "A", 1, &ps) => -2
>       mbrtowc(&wc, "\xb0", 1, &ps) => 1, wc = 0x00C0
>     produces an intermediate -2 without setting wc and then sets wc in the
>     second call.

A possible approach would be to exploit the fact that the gconv converters
can be programmed to behave differently in the wcsmbs situation than in the
iconv() and stdio situation: In the wcsmbs situation, consume_incomplete
is 1, whereas in the other situations it is 0. This parameter could be passed
down to the loop function, through EXTRA_LOOP_DECLS and EXTRA_LOOP_ARGS.

The idea would be to produce Unicode in NFD form (rather than the usual
NFC form) in the mb*towc* functions. That is, have
  mbrtowc(&wc, "A", 1, &ps) => 1, wc = L'A',
  mbrtowc(&wc, "\xb0", 1, &ps) => 1, wc = 0x0300,
and then hope that the caller can cope with decomposed Unicode character
sequences, from the 'wc' program to the display engine in X11.

But in my opinion this is not worth the effort, because non-Unicode
Vietnamese locales don't have a real user base.

Bruno

References:
- regression caused by fix of bug #13691
  - From: Bruno Haible

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]