This is the mail archive of the
cygwin-developers
mailing list for the Cygwin project.
More about charsets
- From: Corinna Vinschen <corinna-cygwin at cygwin dot com>
- To: cygwin-developers at cygwin dot com
- Date: Sat, 27 Mar 2010 15:54:26 +0100
- Subject: More about charsets
- Reply-to: cygwin-developers at cygwin dot com
Hi guys,
while looking into the GB18030 issue once again, I found that we still
may have two holes which might be important to support.
- GB2312 aka EUC-CN
We already support GBK, codepage 936. GB2312/EUC-CN is a subset
of GBK and apparently GBK is often used while still labeled as
GB2312. See the discussion here:
http://www.mail-archive.com/unicode@unicode.org/msg03516.html
So the question is, should we just allow GB2312 and EUC-CN as
codeset names, but use the GBK conversion functions for them?
Otherwise, there's also a codepage 51936, which is called EUC-CN
in the list at
http://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx
I didn't test it, but it appears to be the real GB2312. I don't
know if it really makes sense to make the difference, though.
- EUC-TW
There's a codepage 51950 which appears to be something like EUC-TW.
I just found this, though:
http://code.google.com/p/mintty/source/detail?r=738
Andy, is that a general rule? Or did you test on XP and the codepage
was just not installed, by any chance?
We certainly have other holes as well, but for OS usage I don't see
any other codeset which would be that important.
Anything I'm missing?
Corinna
--
Corinna Vinschen Please, send mails regarding Cygwin to
Cygwin Project Co-Leader cygwin AT cygwin DOT com
Red Hat