This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]

Re: EUC-KR and the Won sign

To: haible at ilog dot fr
Subject: Re: EUC-KR and the Won sign
From: Paul Eggert <eggert at twinsun dot com>
Date: Fri, 27 Oct 2000 20:18:00 -0700 (PDT)
CC: drepper at cygnus dot com, libc-alpha at sources dot redhat dot com
References: <14825.39921.94934.184323@honolulu.ilog.fr><m3n1g6f29o.fsf@otr.mynet.cygnus.com><14841.30961.956380.371402@honolulu.ilog.fr><m3em12m5ec.fsf@otr.mynet.cygnus.com> <14842.3323.968019.108940@honolulu.ilog.fr>

   From: Bruno Haible <haible@ilog.fr>
   Date: Sat, 28 Oct 2000 01:17:15 +0200 (CEST)

   Ulrich Drepper writes:

   > I rather would like to see the two locale types, with and without
   > ASCII compatibility, being available.  The people who actually use
   > those locales know about the problem and can choose appropriately.

   I agree. This would mean two different charsets, though.

Yes, and I don't see how you can avoid this if you want to support
both the ASCII and the KS-Roman variants of EUC-KR.

   But instead of inventing a new EUC-KR variant (EUC-KR-ASCII?
   EUC-KR-US?) it's better to point them to an existing one: CP949
   (upward compatible with EUC-KR except for the backslash) or UTF-8.

This doesn't sound right to me.

First, EUC-KR allows either ASCII or KS-Roman as code set 0, so if you
have two variants they should both be first-class.  If you deprecate
either variant, you'll offend the people who prefer the other one.

Second, in the GNU/Linux world, the ASCII variant predominates; it's
what the Korean GNU translation project uses, and it's what GNU Emacs
uses for the euc-kr coding-system.  If anything should be a
second-class citizen, it should be the KS-Roman variant.

Third, UTF-8 is not at all a reasonable substitute for the ASCII
variant of EUC-KR; they're completely different, as you know.  Also,
as a minor technical point, CP949 disagrees with the KS-Roman variant
of EUC-KR in places other than the backslash.

I realize that no matter what you choose, you'll get controversy.  But
I don't understand why one would want to deprecate the ASCII variants
of the EUC encodings.  If anything, they should be preferred to the
Roman versions, as they're a bit more likely to be the preferred
behavior in practice.

References:
- EUC-KR and the Won sign
  - From: Bruno Haible
- Re: EUC-KR and the Won sign
  - From: Ulrich Drepper
- Re: EUC-KR and the Won sign
  - From: Bruno Haible
- Re: EUC-KR and the Won sign
  - From: Ulrich Drepper
- Re: EUC-KR and the Won sign
  - From: Bruno Haible

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]