This is the mail archive of the cygwin-developers mailing list for the Cygwin project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: "C" UTF-8 trouble

From: Corinna Vinschen <corinna-cygwin at cygwin dot com>
To: cygwin-developers at cygwin dot com
Date: Thu, 8 Oct 2009 11:07:05 +0200
Subject: Re: "C" UTF-8 trouble
References: <20091006153724.GQ12789@calimero.vinschen.de> <20091006181649.GC18135@ednor.casa.cgf.cx> <416096c60910061146m1c4c9aa5ic6b1c55d50233fb5@mail.gmail.com> <4ACBEB43.3080508@byu.net> <416096c60910062307u6c81c82eh790542b72875d7dd@mail.gmail.com> <20091007090317.GV12789@calimero.vinschen.de> <416096c60910070308x387db45au9438462aced8d859@mail.gmail.com> <20091007125427.GW12789@calimero.vinschen.de> <416096c60910070650u11822ff9ld534f80ed123823b@mail.gmail.com> <416096c60910072212n2c8e79dam8b8f380880a1e17e@mail.gmail.com>
Reply-to: cygwin-developers at cygwin dot com

On Oct  8 06:12, Andy Koppe wrote:
> 2009/10/7 Andy Koppe:
> > 2009/10/7 Corinna Vinschen:
> >> At least, from the above it looks like all uppercase. ?The KOI8s would
> >> be covered by a translation table.
> >>
> >> The problem is, we *must* draw a line somewhere.
> >
> > I agree, better to just stick with __locale_charset(), unless problems
> > do arise. FWIW, vim works fine with *.KOI8 locales.
> 
> Actually it's not quite right: on seeing "CP20866", vim falls back to
> iso-8859-1. While this works on the surface, as it's just another
> 8-bit charset, things like case conversion or detecting word
> boundaries might be incorrect.
> 
> Anyway, here's a fix that doesn't involve a translation table:
> 
> * libc/locale/nl_langinfo.c (nl_langinfo): Fall back to
> __locale_charset only if the current locale does not specify a
> charset.
> 
> --- newlib/libc/locale/nl_langinfo.c    7 Oct 2009 16:45:23 -0000       1.3
> +++ newlib/libc/locale/nl_langinfo.c    8 Oct 2009 05:00:23 -0000
> @@ -59,7 +59,11 @@ _DEFUN(nl_langinfo, (item),
>     switch (item) {
>         case CODESET:
>  #ifdef __CYGWIN__
> -               ret = __locale_charset ();
> +               s = setlocale(LC_CTYPE, NULL);
> +               if (s != NULL && (cs = strchr(s, '.')) != NULL)
> +                       ret = cs + 1;
> +               else
> +                       ret = __locale_charset();
>  #else
>                 ret = "";
>                 if ((s = setlocale(LC_CTYPE, NULL)) != NULL) {

Thanks for the patch.  However, the value returned by setlocale has
potentially a trailing modifier, as in LANG="ja_JP.UTF-8@cjknarrow"
If we just return the string after the dot, the codeset is potentially
wrong.  Either we *do* need the translation table, or we have to
copy the value into a static buffer and strip the modifier.  This in
turn requires to implement _nl_langinfo_r.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat

Follow-Ups:
- Re: "C" UTF-8 trouble
  - From: Corinna Vinschen

References:
- Re: "C" UTF-8 trouble
  - From: Corinna Vinschen
- Re: "C" UTF-8 trouble
  - From: Christopher Faylor
- Re: "C" UTF-8 trouble
  - From: Andy Koppe
- Re: "C" UTF-8 trouble
  - From: Eric Blake
- Re: "C" UTF-8 trouble
  - From: Andy Koppe
- Re: "C" UTF-8 trouble
  - From: Corinna Vinschen
- Re: "C" UTF-8 trouble
  - From: Andy Koppe
- Re: "C" UTF-8 trouble
  - From: Corinna Vinschen
- Re: "C" UTF-8 trouble
  - From: Andy Koppe
- Re: "C" UTF-8 trouble
  - From: Andy Koppe

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]