This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: EUC-KR and the Won sign
From: Bruno Haible <haible@ilog.fr>
Date: Sat, 28 Oct 2000 01:17:15 +0200 (CEST)
Ulrich Drepper writes:
> I rather would like to see the two locale types, with and without
> ASCII compatibility, being available. The people who actually use
> those locales know about the problem and can choose appropriately.
I agree. This would mean two different charsets, though.
Yes, and I don't see how you can avoid this if you want to support
both the ASCII and the KS-Roman variants of EUC-KR.
But instead of inventing a new EUC-KR variant (EUC-KR-ASCII?
EUC-KR-US?) it's better to point them to an existing one: CP949
(upward compatible with EUC-KR except for the backslash) or UTF-8.
This doesn't sound right to me.
First, EUC-KR allows either ASCII or KS-Roman as code set 0, so if you
have two variants they should both be first-class. If you deprecate
either variant, you'll offend the people who prefer the other one.
Second, in the GNU/Linux world, the ASCII variant predominates; it's
what the Korean GNU translation project uses, and it's what GNU Emacs
uses for the euc-kr coding-system. If anything should be a
second-class citizen, it should be the KS-Roman variant.
Third, UTF-8 is not at all a reasonable substitute for the ASCII
variant of EUC-KR; they're completely different, as you know. Also,
as a minor technical point, CP949 disagrees with the KS-Roman variant
of EUC-KR in places other than the backslash.
I realize that no matter what you choose, you'll get controversy. But
I don't understand why one would want to deprecate the ASCII variants
of the EUC encodings. If anything, they should be preferred to the
Roman versions, as they're a bit more likely to be the preferred
behavior in practice.