This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

perror, gettext, question marks

To: libc-alpha at sources dot redhat dot com
Subject: perror, gettext, question marks
From: Bruno Haible <haible at ilog dot fr>
Date: Mon, 4 Dec 2000 14:34:17 +0100 (CET)


Hi Ulrich,

I'm getting error messages like the following from bzip2, ar, emacs etc.:

    bunzip2: Compressed file ends unexpectedly;
        perhaps it is corrupted?  *Possible* reason follows.
    bunzip2: Das Argument ist ung?ltig
        Input file = (stdin), output file = (stdout)

    ar: libc_pic.a: Auf dem Ger?t ist kein Speicherplatz mehr verf?gbar

bzip2 uses the perror() function and does not use setlocale(). My environment
variables are consistent:
    LANGUAGE=de:fr:en
    LANG=de_DE.UTF-8

You can argue that all programs should call setlocale(LC_ALL,"") but this
doesn't practically work because:

   * Programs traditionally know about LC_MESSAGES, not LC_CTYPE. See for
     example, the SUSV2 description of 'strerror':

     "The contents of the error message strings returned by strerror()
      should be determined by the setting of the LC_MESSAGES category in
      the current locale."

   * There are many programs like this, which depend on the <ctype.h>
     functions in the C locale. In some cases, changing the programs
     is significantly hard. For GNU binutils, I sent a 250 KB patch, which
     was not even remotely considered for inclusion.

   * The appearance of the german message, due to the setting of LANGUAGE,
     is a GNU extension. Also, gettext is a GNU extension, and no standard
     says that gettext should behave like this.

Therefore I propose to change gettext: When the current LC_CTYPE locale
is "C" but the first non-empty value among the environment variables
LC_ALL, LC_CTYPE, LANG is not "C", and OUTPUT_CHARSET is not set, then
gettext() shall use the encoding from these environment variables.

This would not violate standards, because in a standard compliant situation
there is no LANGUAGE variable, therefore either
  - LC_MESSAGES is "C", the message is in English, and the target charset
    won't matter, or
  - LC_MESSAGES is set to != "C" through setlocale, then the LC_CTYPE
    environment variable must also have been set to a corresponding value
    by the user and taken into account by the program (see SUSV2: "If
    different character sets are used by the locale categories, the
    results achieved by an application utilising these categories are
    undefined."), and then the behaviour of gettext/strerror will not change.

Here is a patch.

2000-12-02  Bruno Haible  <haible@clisp.cons.org>

	* intl/loadmsgcat.c (_nl_load_domain): When OUTPUT_CHARSET is not
	set and the current locale is "C", look at the environment variables.

*** glibc-2.2/intl/loadmsgcat.c.bak	Thu Sep  7 20:56:28 2000
--- glibc-2.2/intl/loadmsgcat.c	Sun Dec  3 01:01:55 2000
***************
*** 65,70 ****
--- 65,71 ----
  
  #ifdef _LIBC
  # include "../locale/localeinfo.h"
+ extern struct locale_data *const _nl_C[];
  #endif
  
  /* @@ end of prolog @@ */
***************
*** 301,314 ****
  	      outcharset = getenv ("OUTPUT_CHARSET");
  	      if (outcharset == NULL || outcharset[0] == '\0')
  		{
  # ifdef _LIBC
! 		  outcharset = (*_nl_current[LC_CTYPE])->values[_NL_ITEM_INDEX (CODESET)].string;
  # else
  #  if HAVE_ICONV
  		  extern const char *locale_charset (void);
! 		  outcharset = locale_charset ();
  		  if (outcharset == NULL)
! 		    outcharset = "";
  #  endif
  # endif
  		}
--- 302,387 ----
  	      outcharset = getenv ("OUTPUT_CHARSET");
  	      if (outcharset == NULL || outcharset[0] == '\0')
  		{
+ 		  /* Use the value from the LC_CTYPE locale.  Except that
+ 		     if the LC_CTYPE locale is set to "C" but the user's
+ 		     environment differ, we use the latter.  The purpose
+ 		     is to be forgiving towards programs which don't call
+ 		     setlocale (LC_CTYPE, "").  */
  # ifdef _LIBC
! 		  struct locale_data *locale;
! 
! 		  outcharset = NULL;
! 		  locale = *_nl_current[LC_CTYPE];
! 		  if (locale == _nl_C[LC_CTYPE])
! 		    {
! 		      const char *envval;
! 
! 		      envval = getenv ("LC_ALL");
! 		      if (envval == NULL || envval[0] == '\0')
! 			{
! 			  envval = getenv ("LC_CTYPE");
! 			  if (envval == NULL || envval[0] == '\0')
! 			    envval = getenv ("LANG");
! 			}
! 		      if (envval != NULL && envval[0] != '\0'
! 			  && strcmp (envval, "C") != 0
! 			  && strcmp (envval, "POSIX") != 0)
! 			{
! 			  __locale_t envlocale;
! 
! 			  envlocale = __newlocale (1 << LC_CTYPE, envval, NULL);
! 			  if (envlocale != NULL)
! 			    {
! 			      outcharset = envlocale->__locales[LC_CTYPE]->values[_NL_ITEM_INDEX (CODESET)].string;
! 			      outcharset = strdupa (outcharset);
! 			      __freelocale (envlocale);
! 			    }
! 			}
! 		    }
! 		  if (outcharset == NULL)
! 		    outcharset = locale->values[_NL_ITEM_INDEX (CODESET)].string;
  # else
  #  if HAVE_ICONV
  		  extern const char *locale_charset (void);
! #   if HAVE_SETLOCALE
! 		  const char *locale;
! 
! 		  outcharset = NULL;
! 		  locale = setlocale (LC_CTYPE, NULL);
! 		  if (locale == NULL
! 		      || strcmp (locale, "C") == 0
! 		      || strcmp (locale, "POSIX") == 0)
! 		    {
! 		      const char *envval;
! 
! 		      envval = getenv ("LC_ALL");
! 		      if (envval == NULL || envval[0] == '\0')
! 			{
! 			  envval = getenv ("LC_CTYPE");
! 			  if (envval == NULL || envval[0] == '\0')
! 			    envval = getenv ("LANG");
! 			}
! 		      if (envval != NULL && envval[0] != '\0'
! 			  && strcmp (envval, "C") != 0
! 			  && strcmp (envval, "POSIX") != 0)
! 			{
! 			  setlocale (LC_CTYPE, "");
! 			  outcharset = locale_charset ();
! 			  if (outcharset != NULL)
! 			    outcharset = strdup (outcharset);
! 			  if (outcharset == NULL)
! 			    outcharset = "";
! 			  setlocale (LC_CTYPE, "C");
! 			}
! 		    }
! 
  		  if (outcharset == NULL)
! #   endif
! 		    {
! 		      outcharset = locale_charset ();
! 		      if (outcharset == NULL)
! 			outcharset = "";
! 		    }
  #  endif
  # endif
  		}

Follow-Ups:
- Re: perror, gettext, question marks
  - From: Ulrich Drepper

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]