On Feb 20 13:09, Corinna Vinschen wrote:
Ok, here's my new setlocale implementation. It fixes the following
problems:
[...]
- Per POSIX allow the required "POSIX" locale. Map it to the "C" locale
as on Linux.
- If locale is "", honor the environment in the order required by POSIX
for all supported categories.
Apart from that, would it be ok to change setlocale() and subsequent
functions using __lc_ctype (e.g. mbtowc_r, wctomb_r, iswXXX) so that all
POSIX compliant LC_XXX environment variable settings are taken? The
currently accepted locales
C[-codeset]
are non-POSIX. The POSIX variant is
[language[_territory][.codeset][@modifier]]
Of course we should keep recognizing the C[-codeset] for backward
compatibility but I think we should not stick to them.
Actually all the related functions only rely on the charset part of the
setting, not the actual language. So, what we could do is to split away
the charset part along the lines of what is already done in the
LC_MESSAGES part of the code and only check for that in the subsequent
functions. Instead of checking against __lc_ctype these functions could
check for, say, __lc_charset. The LC_CTYPE setting could then reflect
the real setting of the environment. For instance:
LC_ALL=POSIX
==> __lc_ctype == C
__lc_charset = ISO-8859 (!)
LC_ALL=en_US.UTF-8
==> __lc_ctype == en_US.UTF-8
__lc_charset = UTF-8
LC_ALL=jp_JP.EUCJP
==> __lc_ctype == jp_JP.EUCJP
__lc_charset = EUCJP
LC_ALL=de
==> __lc_ctype == de
__lc_charset = ISO-8859 (!)
LC_ALL=fr_FR.ISO-8859-15
==> __lc_ctype == fr_FR.ISO-8859-15
__lc_charset = ISO-8859 (!)
Actually the __lc_charset could be a single character like I for ISO,
U for UTF, E for EUCJP, etc, to simplify the checks in mbtowc_r and the
others.
What do you say?