This is the mail archive of the cygwin-developers mailing list for the Cygwin project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: charset changes


Corinna Vinschen schrieb:
...
I just read the GB18030 entry in the german wikipedia again and, boy,
I dislike that codeset immediately every time. 2-byte sequences have
a trailing byte in the range 0x40-0xfe, 3-byte sequences don't exist,
4-byte sequences have a second and forth byte in the range 0x30-0x39.
Why, oh why, do codeset implementors have to overload the ASCII range
without need.
While unwieldy to handle, this is historically explained. It is an immediate consequence of the design requirement to be upwards compatible with GBK which already used the range from 0x40, so in order to distinguish new and longer sequences from the GBK 2-byte sequences there was no choice than to use an even lower range for them.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]