This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
perror, gettext, question marks
- To: libc-alpha at sources dot redhat dot com
- Subject: perror, gettext, question marks
- From: Bruno Haible <haible at ilog dot fr>
- Date: Mon, 4 Dec 2000 14:34:17 +0100 (CET)
Hi Ulrich,
I'm getting error messages like the following from bzip2, ar, emacs etc.:
bunzip2: Compressed file ends unexpectedly;
perhaps it is corrupted? *Possible* reason follows.
bunzip2: Das Argument ist ung?ltig
Input file = (stdin), output file = (stdout)
ar: libc_pic.a: Auf dem Ger?t ist kein Speicherplatz mehr verf?gbar
bzip2 uses the perror() function and does not use setlocale(). My environment
variables are consistent:
LANGUAGE=de:fr:en
LANG=de_DE.UTF-8
You can argue that all programs should call setlocale(LC_ALL,"") but this
doesn't practically work because:
* Programs traditionally know about LC_MESSAGES, not LC_CTYPE. See for
example, the SUSV2 description of 'strerror':
"The contents of the error message strings returned by strerror()
should be determined by the setting of the LC_MESSAGES category in
the current locale."
* There are many programs like this, which depend on the <ctype.h>
functions in the C locale. In some cases, changing the programs
is significantly hard. For GNU binutils, I sent a 250 KB patch, which
was not even remotely considered for inclusion.
* The appearance of the german message, due to the setting of LANGUAGE,
is a GNU extension. Also, gettext is a GNU extension, and no standard
says that gettext should behave like this.
Therefore I propose to change gettext: When the current LC_CTYPE locale
is "C" but the first non-empty value among the environment variables
LC_ALL, LC_CTYPE, LANG is not "C", and OUTPUT_CHARSET is not set, then
gettext() shall use the encoding from these environment variables.
This would not violate standards, because in a standard compliant situation
there is no LANGUAGE variable, therefore either
- LC_MESSAGES is "C", the message is in English, and the target charset
won't matter, or
- LC_MESSAGES is set to != "C" through setlocale, then the LC_CTYPE
environment variable must also have been set to a corresponding value
by the user and taken into account by the program (see SUSV2: "If
different character sets are used by the locale categories, the
results achieved by an application utilising these categories are
undefined."), and then the behaviour of gettext/strerror will not change.
Here is a patch.
2000-12-02 Bruno Haible <haible@clisp.cons.org>
* intl/loadmsgcat.c (_nl_load_domain): When OUTPUT_CHARSET is not
set and the current locale is "C", look at the environment variables.
*** glibc-2.2/intl/loadmsgcat.c.bak Thu Sep 7 20:56:28 2000
--- glibc-2.2/intl/loadmsgcat.c Sun Dec 3 01:01:55 2000
***************
*** 65,70 ****
--- 65,71 ----
#ifdef _LIBC
# include "../locale/localeinfo.h"
+ extern struct locale_data *const _nl_C[];
#endif
/* @@ end of prolog @@ */
***************
*** 301,314 ****
outcharset = getenv ("OUTPUT_CHARSET");
if (outcharset == NULL || outcharset[0] == '\0')
{
# ifdef _LIBC
! outcharset = (*_nl_current[LC_CTYPE])->values[_NL_ITEM_INDEX (CODESET)].string;
# else
# if HAVE_ICONV
extern const char *locale_charset (void);
! outcharset = locale_charset ();
if (outcharset == NULL)
! outcharset = "";
# endif
# endif
}
--- 302,387 ----
outcharset = getenv ("OUTPUT_CHARSET");
if (outcharset == NULL || outcharset[0] == '\0')
{
+ /* Use the value from the LC_CTYPE locale. Except that
+ if the LC_CTYPE locale is set to "C" but the user's
+ environment differ, we use the latter. The purpose
+ is to be forgiving towards programs which don't call
+ setlocale (LC_CTYPE, ""). */
# ifdef _LIBC
! struct locale_data *locale;
!
! outcharset = NULL;
! locale = *_nl_current[LC_CTYPE];
! if (locale == _nl_C[LC_CTYPE])
! {
! const char *envval;
!
! envval = getenv ("LC_ALL");
! if (envval == NULL || envval[0] == '\0')
! {
! envval = getenv ("LC_CTYPE");
! if (envval == NULL || envval[0] == '\0')
! envval = getenv ("LANG");
! }
! if (envval != NULL && envval[0] != '\0'
! && strcmp (envval, "C") != 0
! && strcmp (envval, "POSIX") != 0)
! {
! __locale_t envlocale;
!
! envlocale = __newlocale (1 << LC_CTYPE, envval, NULL);
! if (envlocale != NULL)
! {
! outcharset = envlocale->__locales[LC_CTYPE]->values[_NL_ITEM_INDEX (CODESET)].string;
! outcharset = strdupa (outcharset);
! __freelocale (envlocale);
! }
! }
! }
! if (outcharset == NULL)
! outcharset = locale->values[_NL_ITEM_INDEX (CODESET)].string;
# else
# if HAVE_ICONV
extern const char *locale_charset (void);
! # if HAVE_SETLOCALE
! const char *locale;
!
! outcharset = NULL;
! locale = setlocale (LC_CTYPE, NULL);
! if (locale == NULL
! || strcmp (locale, "C") == 0
! || strcmp (locale, "POSIX") == 0)
! {
! const char *envval;
!
! envval = getenv ("LC_ALL");
! if (envval == NULL || envval[0] == '\0')
! {
! envval = getenv ("LC_CTYPE");
! if (envval == NULL || envval[0] == '\0')
! envval = getenv ("LANG");
! }
! if (envval != NULL && envval[0] != '\0'
! && strcmp (envval, "C") != 0
! && strcmp (envval, "POSIX") != 0)
! {
! setlocale (LC_CTYPE, "");
! outcharset = locale_charset ();
! if (outcharset != NULL)
! outcharset = strdup (outcharset);
! if (outcharset == NULL)
! outcharset = "";
! setlocale (LC_CTYPE, "C");
! }
! }
!
if (outcharset == NULL)
! # endif
! {
! outcharset = locale_charset ();
! if (outcharset == NULL)
! outcharset = "";
! }
# endif
# endif
}