This is the mail archive of the cygwin-developers mailing list for the Cygwin project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Lone surrogates in UTF-8? (was: Re: Console codepage setting via chcp?)

From: Corinna Vinschen <corinna-cygwin at cygwin dot com>
To: cygwin-developers at cygwin dot com
Date: Sun, 27 Sep 2009 10:59:53 +0200
Subject: Re: Lone surrogates in UTF-8? (was: Re: Console codepage setting via chcp?)
References: <416096c60909262332j37d13eb4k400a7ca6c488872e@mail.gmail.com> <416096c60909270133g5bf09343ge8e5e8863f54571d@mail.gmail.com> <20090927085501.GY30851@calimero.vinschen.de>
Reply-to: cygwin-developers at cygwin dot com

On Sep 27 10:55, Corinna Vinschen wrote:
> On Sep 27 09:33, Andy Koppe wrote:
> > >> The __utf8_wctomb function could just create the corresponding
> > >> UCS-2 values if no first half has been encountered before. ?The
> > >> __utf8_mbtowc function could simply allow these UCS-2 values again.
> > >>
> > >> That works (I just tested it) and is a small change, but is it really
> > >> desirable to allow UCS-2 values in UTF-8 strings?
> > >
> > > I don't know.
> > 
> > Improved answer: Debian allows them!
> 
> Sure, just as almost any C library allows invalid UTF-8 5 and 6 byte
> sequences to be converted to and from wchar_t (if sizeof(wchar_t) is 4).

Oh and, btw., given that sizeof(wchar_t) is 4, glibc has of course no
reason to do any surrogate pair handling at all.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

References:
- Lone surrogates in UTF-8? (was: Re: Console codepage setting via chcp?)
  - From: Andy Koppe
- Re: Lone surrogates in UTF-8? (was: Re: Console codepage setting via chcp?)
  - From: Andy Koppe
- Re: Lone surrogates in UTF-8? (was: Re: Console codepage setting via chcp?)
  - From: Corinna Vinschen

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]