This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: "C" UTF-8 trouble


On Oct  7 11:08, Andy Koppe wrote:
> 2009/10/7 Corinna Vinschen:
> > Urgh. ?So we have to change nl_langinfo in newlib as well. ?Do we have
> > to return "US-ASCII" if charset is "ASCII", or is it sufficient to
> > return __locale_charset() as you did, thus returning "ASCII" for "ASCII"?
> 
> I'd assume so, but WWLD?

===
#include <stdio.h>
#include <locale.h>
#include <langinfo.h>

int main ()
{
  char *l;

  setlocale (LC_ALL, "");
  l = nl_langinfo (CODESET);
  if (l)
    printf ("%s\n", l);
  return 0;
}
===

$ ./nll
ANSI_X3.4-1968

$ LANG=C.UTF-8 ./nll
ANSI_X3.4-1968

$ LANG=ja_JP ./nll
EUC-JP

$ LANG=ru_RU ./nll
ISO-8859-5

$ LANG=ru_UA ./nll
KOI8-U

$ LANG=zh_CN ./nll
GB2312

$ LANG=zh_TW ./nll
BIG5

Sigh.  Do we really need a translation table?

> > And what about stuff like "eucJP" vs. "EUCJP"? ?The charset in newlib
> > is always uppercase right now.
> 
> Hmm. There's also the KOI8s, which turn into CP2[01]866.

At least, from the above it looks like all uppercase.  The KOI8s would
be covered by a translation table.

The problem is, we *must* draw a line somewhere.  Otherwise it will turn
out that we're not finished with this stuff, unless we have all
implemented exactly as on Linux.  That puts the 1.7.1 release off to
about 2012.

> > As for Emacs, I'm wondering if it shouldn't be changed to set its locale
> > according to setlocale(LC_CTYPE,NULL) instead, given what POSIX says.
> 
> Well, yes, but good luck with that. When Ken Brown raised the ^? vs ^H
> issue, they told him that sending ^H for backspace should be
> considered a bug.

That's a SEP, IMHO.

> > I, too, think this is a good idea. ?__get_locale_env() should be changed
> > to return "C.UTF-8".
> >
> > It would be nice to check /etc/defaults/locale in __get_locale_env() as
> > well, but I'm a bit reluctant to do that. ?It means, every invocation of
> > a Cygwin process has to open that file if the environment isn't set.
> > Talking about performance...
> >
> > Alternatively, the first invocation of Cygwin in a process tree could
> > try to read this file only.
> 
> Agreed with the last point, but I think setenv("LANG",...) at the
> first invocation of Cygwin is a better and simpler solution than
> changing __get_locale_env(), because:

Not exactly simpler.  At the places where the first invocation of a
Cygwin process tree is handled, there's no such thing as a POSIX
environment yet.

> - it solves the emacs isssue
> - applications will get the same result from setlocale(,"") and
> reading the environment variables themselves, so apps that do the
> latter don't have to be changed- it's more like Linux
> - it doesn't require a newlib change

But it requires a Cygwin change, so the difference is not that big.
And the actual implementation where to get the default locale from
is still open.


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Project Co-Leader          cygwin AT cygwin DOT com
Red Hat


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]