This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: nl_langinfo(CODESET)
- To: libc-alpha at sources dot redhat dot com
- Subject: Re: nl_langinfo(CODESET)
- From: Markus Kuhn <Markus dot Kuhn at cl dot cam dot ac dot uk>
- Date: Fri, 03 Nov 2000 09:44:04 +0000
- Cc: "Martin v. Loewis" <martin at loewis dot home dot cs dot tu-berlin dot de>
"Martin v. Loewis" wrote on 2000-11-03 09:18 UTC:
> I believe the call nl_langinfo(CODESET) works incorrectly in glibc
> (all releases). When LANG is set de_DE.ISO-8859-1, then I'd expect the
> program
>
> #include <langinfo.h>
>
> int main()
> {
> printf("%s\n",nl_langinfo(CODESET));
> }
>
> to print ISO-8859-1. Instead, it prints ANSI_X3.4-1968. If that is
> indeed the correct result, please let me know how to programmatically
> find out the execution character set.
I think what happens is the following: You do not have a locale called
"de_DE.ISO-8859-1" installed in /usr/share/locale/, therefore glibc
can't find it and falls silently back to the "C" locale. The "C" locale
uses the "ANSI_X3.4-1968" repertoire (I think for no good reason, the
standard would allow equally well for instance ISO 8859-1), and
nl_langinfo(CODESET) says so correctly.
Possible fixes:
a) You probably have a locale "de_DE" installed, which uses ISO8859-1. So
try "LANG=de_DE" instead.
b) You can generate your own properly named private locale with
localedef -v -c -i de_DE -f ISO8859-1 $HOME/local/locale/de_DE.ISO8859-1
and use it with:
LOCPATH=$HOME/local/locale LANG=de_DE.ISO8859-1 ./test
c) It is indeed confusing that the locales generated by localedata/
SUPPORTED and localedata/Makefile install-locales do not always contain
the encoding as part of the name. May be, this should be fixed in either
the SUPPORTED file or even better in the Makefile (which should
automatically add the encoding before the @ or end of locale name).
On a similar note: localedata/SUPPORTED currently contains four
locales
fa_IR UTF-8
hi_IN UTF-8
mr_IN UTF-8
vi_VN UTF-8
that do not contain the encoding in their name. Since there are a couple
of programs out there (for example less and xterm!) which parse "LANG",
"LC_CTYPE", and "LC_ALL" to find out whether UTF-8 is used (instead of
using nl_langinfo(CODESET)), it would be good to attach ".UTF-8" to
these four locale names in particular.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>