This is the mail archive of the
cygwin-developers
mailing list for the Cygwin project.
Re: More about charsets
- From: Andy Koppe <andy dot koppe at gmail dot com>
- To: cygwin-developers at cygwin dot com
- Date: Sat, 27 Mar 2010 16:11:11 +0000
- Subject: Re: More about charsets
- References: <20100327145426.GB15896@calimero.vinschen.de>
Corinna Vinschen:
> while looking into the GB18030 issue once again, I found that we still
> may have two holes which might be important to support.
>
> - GB2312 aka EUC-CN
>
> ÂWe already support GBK, codepage 936. ÂGB2312/EUC-CN is a subset
> Âof GBK and apparently GBK is often used while still labeled as
> ÂGB2312. ÂSee the discussion here:
> Âhttp://www.mail-archive.com/unicode@unicode.org/msg03516.html
>
> ÂSo the question is, should we just allow GB2312 and EUC-CN as
> Âcodeset names, but use the GBK conversion functions for them?
Might as well. As you saw, mintty already does that. Thomas Wolff's
mined goes even further and handles both GB2312 and GBK with its
GB18030 codec, because GBK is a subset of GB18030.
> ÂOtherwise, there's also a codepage 51936, which is called EUC-CN
> Âin the list at
> Âhttp://msdn.microsoft.com/en-us/library/dd317756%28VS.85%29.aspx
> ÂI didn't test it, but it appears to be the real GB2312. ÂI don't
> Âknow if it really makes sense to make the difference, though.
Also, it isn't available on any Windows I've tried.
> - EUC-TW
>
> ÂThere's a codepage 51950 which appears to be something like EUC-TW.
> ÂI just found this, though:
> Âhttp://code.google.com/p/mintty/source/detail?r=738
>
> ÂAndy, is that a general rule? ÂOr did you test on XP and the codepage
> Âwas just not installed, by any chance?
It doesn't show up as an option on XP, and I've just tried it again on
Windows 7, where codepages are no longer optional. Doesn't work. I
think I'd read somewhere that 51950 is only available for .Net
programs, but unfortunately I can't find that again. I guess it's
possible that Chinese Windows versions do support it anyway, although
Wikipedia describes EUC-TW as "rarely used".
> We certainly have other holes as well, but for OS usage I don't see
> any other codeset which would be that important.
I agree.
Andy