This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Improved check-localedef script


4.08.2017 11:14 Mike FABIAN <mfabian@redhat.com> wrote:
> [...]
> I am not sure what do do about this one:
>
> ca_ES:87: string not representable in iso8859-1:
> 20AC

I've just written another email about it. :-)

> This is the euro symbol, the line from the source file is:
>
> currency_symbol "<U20AC>"
>
> SUPPORTED contains:
>
> ca_ES.UTF-8/UTF-8 \
> ca_ES/ISO-8859-1 \
> ca_ES@euro/ISO-8859-15 \
>
> But even though U+20AC cannot be converted to ISO-8859-1, the
> ca_ES.ISO-8859-1 locale still works because it is transliterated:
>
> $ LC_ALL=ca_ES locale -k currency_symbol charmap
> currency_symbol="EUR"
> charmap="ISO-8859-1"
>
> So this does not cause an actual problem.

So the "€" character is actually representable in ISO-8859-1 because
we can convert it to "EUR".  Looks like a false positive then.

> The ca_ES source file is not ASCII, it has
>
> % català
> lang_name "<U0063><U0061><U0074><U0061><U006C><U00E0>"
>
> So maybe I could just convert the file to UTF-8
> and change “% Charset: ISO-8859-1” into “% Charset: UTF-8”
> to get rid of the check-localedef warning.
>
> Would that be OK?

I think that no, it's not OK.  If I understand correctly the
"source file is ASCII" sentence means that the individual characters:
'<', '2', '0', 'A', 'C', '>' are ASCII.  They may describe something
more complex like <U00E0>.  But even this is not UTF-8 because UTF-8
would be <C3> <A0> (UTF-8 is 8-bit).  The closest charset would be
UCS-2 or simply a generic Unicode.

Caution: we are mixing metalevels here: what characters we describe
vs what characters we use to describe. :-)

Regards,

Rafal


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]