This is the mail archive of the newlib@sourceware.org mailing list for the newlib project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Read locale settings from environment


Corinna Vinschen wrote:
On Feb 20 13:09, Corinna Vinschen wrote:
Ok, here's my new setlocale implementation.  It fixes the following
problems:
[...]
- Per POSIX allow the required "POSIX" locale.  Map it to the "C" locale
  as on Linux.

- If locale is "", honor the environment in the order required by POSIX
for all supported categories.

Apart from that, would it be ok to change setlocale() and subsequent functions using __lc_ctype (e.g. mbtowc_r, wctomb_r, iswXXX) so that all POSIX compliant LC_XXX environment variable settings are taken? The currently accepted locales

C[-codeset]

are non-POSIX. The POSIX variant is

[language[_territory][.codeset][@modifier]]

Of course we should keep recognizing the C[-codeset] for backward
compatibility but I think we should not stick to them.

Actually all the related functions only rely on the charset part of the
setting, not the actual language.  So, what we could do is to split away
the charset part along the lines of what is already done in the
LC_MESSAGES part of the code and only check for that in the subsequent
functions.  Instead of checking against __lc_ctype these functions could
check for, say, __lc_charset.  The LC_CTYPE setting could then reflect
the real setting of the environment.  For instance:

LC_ALL=POSIX

  ==>  __lc_ctype == C
       __lc_charset = ISO-8859  (!)

LC_ALL=en_US.UTF-8

  ==>  __lc_ctype == en_US.UTF-8
       __lc_charset = UTF-8

LC_ALL=jp_JP.EUCJP

  ==>  __lc_ctype == jp_JP.EUCJP
       __lc_charset = EUCJP

LC_ALL=de

  ==>  __lc_ctype == de
       __lc_charset = ISO-8859  (!)

LC_ALL=fr_FR.ISO-8859-15

==> __lc_ctype == fr_FR.ISO-8859-15
__lc_charset = ISO-8859 (!)
Actually the __lc_charset could be a single character like I for ISO,
U for UTF, E for EUCJP, etc, to simplify the checks in mbtowc_r and the
others.



What do you say?


Ok. I think the charset should be full instead of single character.

-- Jeff J.
Corinna



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]