This is the mail archive of the cygwin mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: bug in mbrtowc?


On Jul 27 22:56, Andy Koppe wrote:
> I've encountered what looks like a bug in mbrtowc's handling of UTF-8.
> Here's an example:
> 
> #include <stdio.h>
> #include <locale.h>
> #include <stdlib.h>
> #include <wchar.h>
> 
> int main(void) {
>   wchar_t wc;
>   size_t ret;
>   mbstate_t s = { 0 };
>   puts(setlocale(LC_CTYPE, "en_GB.UTF-8"));
>   printf("%i\n", mbrtowc(&wc, "\xe2", 1, 0));
>   printf("%i\n", mbrtowc(&wc, "\x94", 1, 0));
>   printf("%i\n", mbrtowc(&wc, "\x84", 1, 0));
>   printf("%x\n", wc);
>   return 0;
> }
> 
> The sequence E2 94 84 should translate to U+2514. Instead, the second
> and third calls to mbrtowc report encoding errors. It does work
> correctly if the three bytes are passed to mbrtowc() in one go:
> 
>   printf("%i\n", mbrtowc(&wc, "\xe2\x94\x84", 3, 0));

That's a bug in the newlib function __utf8_mbtowc.  I'm really surprised
that this bug has never been reported before since it's in the code for
years, probably since it has been introduced in 2002.

I'll follow up on the newlib list.


Thanks for the report and especially thanks for the testcase,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]