This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Should glibc provide a builtin C.UTF-8 locale?
- From: Rich Felker <dalias at libc dot org>
- To: Mike FABIAN <mfabian at redhat dot com>
- Cc: keld at keldix dot com, Carlos O'Donell <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, Pravin Satpute <psatpute at redhat dot com>, Jens Petersen <petersen at redhat dot com>
- Date: Thu, 29 Oct 2015 14:35:44 -0400
- Subject: Re: Should glibc provide a builtin C.UTF-8 locale?
- Authentication-results: sourceware.org; auth=none
- References: <54DB8243 dot 3050903 at redhat dot com> <20151021174936 dot GA26317 at vapier dot lan> <5627DAAE dot 8060703 at redhat dot com> <20151021205540 dot GA30739 at www5 dot open-std dot org> <s9dr3kgfqlx dot fsf at ari dot site> <20151027223455 dot GB8645 at brightrain dot aerifal dot cx> <s9dmvv1tu2n dot fsf at ari dot site>
On Thu, Oct 29, 2015 at 07:20:48PM +0100, Mike FABIAN wrote:
> Rich Felker <dalias@libc.org> wrote:
>
> >> LC_CTYPE
> >> almost the same
> >> - C.UTF-8 just copies the LC_CTYPE from "i18n" (Which is kept
> >> in sync with the latest Unicode release using some scripts) and
> >> adds "translit_combining".
> >
> > So C.UTF-8 will have the full character-class data? I'm in favor of
> > that but just want to clarify, since omitting it would also be
> > possible.
>
> Yes, with the patch I made, it has the full character-class data,
> i.e. exactly the same as in the i18n file.
Sounds good.
> >> LC_MONETARY
> >> - C.UTF-8 tries to agree with C/POSIX as much as possible
> >> and thus uses "USD" for int_curr_symbol, "$" for currency_symbol,
> >> and "." for mon_decimal_point.
> >
> > This is incorrect, at least based on the spec. C requires the values
> > for int_curr_symbol and currency_symbol to be "" in the C locale (7.11
> > Localization <locale.h>, paragraph 2). I think the values you cited
> > are from en_US.
>
> I wanted to fill in something for int_curr_symbol and currency_symbol
> mainly because "localedef" complains when these fields are empty
> and refuses to generate the binary locale unless one uses the force
> option:
>
> -c, --force
> Write the output files even if warnings were generated
> about the input file.
>
> and this might make one miss real errors.
>
> Maybe "localedef" should be adapted to allow empty values
> for these two fields if the locale to be generated is C.UTF-8?
Yes, I think so. Putting en_US values in there is inappropriate and
makes this locale not much of a C.UTF-8 locale but just a
slightly-different variant of en_US.
> >> LC_MESSAGES
> >> - C.UTF-8 uses the same as C/POSIX
> >> (for example yesexpr "^[yY]" and noexpr "^[nN]"
> >> - i18n.UTF-8 apparently tries to avoid English
> >> (for example yesexpr "^[+1]" and noexpr "^[-0]")
> >
> > What about error messages? This is probably off-topic, but it might be
> > nice if i18n used the actual errno macro names as strings ("ENOENT",
> > etc.) if it doesn't already.
>
> There was nothing for error messages in the i18n file. Neither
> in C/POSIX.
OK. The reason I raise this is that I actually got several user
requests for musl to use the raw E* macro names rather than
descriptive English strings in the C locale. I don't think glibc would
want to make such a change in the C locale (and we probably wouldn't
in musl either), but the i18n locale might be a nice place to
experiment with it.
Rich