This is the mail archive of the
cygwin-developers
mailing list for the Cygwin project.
Re: "C" UTF-8 trouble
On Oct 8 06:12, Andy Koppe wrote:
> 2009/10/7 Andy Koppe:
> > 2009/10/7 Corinna Vinschen:
> >> At least, from the above it looks like all uppercase. ?The KOI8s would
> >> be covered by a translation table.
> >>
> >> The problem is, we *must* draw a line somewhere.
> >
> > I agree, better to just stick with __locale_charset(), unless problems
> > do arise. FWIW, vim works fine with *.KOI8 locales.
>
> Actually it's not quite right: on seeing "CP20866", vim falls back to
> iso-8859-1. While this works on the surface, as it's just another
> 8-bit charset, things like case conversion or detecting word
> boundaries might be incorrect.
>
> Anyway, here's a fix that doesn't involve a translation table:
>
> * libc/locale/nl_langinfo.c (nl_langinfo): Fall back to
> __locale_charset only if the current locale does not specify a
> charset.
>
> --- newlib/libc/locale/nl_langinfo.c 7 Oct 2009 16:45:23 -0000 1.3
> +++ newlib/libc/locale/nl_langinfo.c 8 Oct 2009 05:00:23 -0000
> @@ -59,7 +59,11 @@ _DEFUN(nl_langinfo, (item),
> switch (item) {
> case CODESET:
> #ifdef __CYGWIN__
> - ret = __locale_charset ();
> + s = setlocale(LC_CTYPE, NULL);
> + if (s != NULL && (cs = strchr(s, '.')) != NULL)
> + ret = cs + 1;
> + else
> + ret = __locale_charset();
> #else
> ret = "";
> if ((s = setlocale(LC_CTYPE, NULL)) != NULL) {
Thanks for the patch. However, the value returned by setlocale has
potentially a trailing modifier, as in LANG="ja_JP.UTF-8@cjknarrow"
If we just return the string after the dot, the codeset is potentially
wrong. Either we *do* need the translation table, or we have to
copy the value into a static buffer and strip the modifier. This in
turn requires to implement _nl_langinfo_r.
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat