This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] en_CA, es_AR, es_ES: Define yesstr and nostr.
- From: Petr Baudis <pasky at ucw dot cz>
- To: Keld Simonsen <keld at keldix dot com>
- Cc: Carlos O'Donell <carlos at redhat dot com>, libc-locales at sourceware dot org, GNU C Library <libc-alpha at sourceware dot org>
- Date: Mon, 8 Apr 2013 01:23:59 +0200
- Subject: Re: [PATCH] en_CA, es_AR, es_ES: Define yesstr and nostr.
- References: <51619965 dot 9030600 at redhat dot com> <20130407210205 dot GX6137 at machine dot or dot cz> <20130407231451 dot GA14004 at rap dot rap dot dk>
Hi!
On Mon, Apr 08, 2013 at 01:14:51AM +0200, Keld Simonsen wrote:
> On Sun, Apr 07, 2013 at 11:02:06PM +0200, Petr Baudis wrote:
> > (Though I'm not particularly fond of having the ASCII contents of the
> > datapoint sequence repeated in the comment, as all data duplication adds
> > a potential for inconsistencies. Ideally, we would just actually write
> > the characters right in the values instead of the codepoints; I didn't
> > find any technical reason why to insist on the <U...> syntax for all
> > characters. But then again, I'm personally unlikely to gather the
> > momentum to do such a change, mainly to verify that it really is 100%
> > safe.)
>
> The locales are character set independent, so they will run with utf-8, iso-8859-1, iso-8859-15
> and even EBCDIC. They are written in ASCII only, to better the portability between systems with
> different character sets.
But itt's 2013. I claim that portability of locale source files to
EBCDIC is totally irrelevant in glibc and whoever cares should bear the
burden of writing the conversion tools.
I don't think it would be a big fuss if we just UTF8-encoded locale
files, but even if we only embrace the ASCII (!) and substitute 7bit
codepoint markups with the actual ASCII characters, that would be a
huge practical step forward already.
The only thing is, I'm not 100% sure if there are any other tools
looking at the locale source files that would break if we did this,
and if it's a big deal to break these tools in case there are any.
> Originally I wrote many locales using some mnemonic scheme, that
> made them easier to read, such as <A> for <U0041>, <B> for <U0042>, <b> for <U0062> etc,
> but Ulrich Drepper did not like that and recoded all the locales to use the <Uxxxx> notation.
> Some of the mnemonics were a bit complex, but IMHO they were far easier to
> proofread than the <Uxxxx> notation, and some came directly from the POSIX standard.
> They were documented in the POSIX.2 standard from 1992, and also in TR 14652.
Indeed, I have seen some of these locale files I think. But if you
mean <U0041>, why write even <A> if you can write A?
--
Petr "Pasky" Baudis
For every complex problem there is an answer that is clear,
simple, and wrong. -- H. L. Mencken