This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: BIG5 charmap problems


Dear Bruno Haible:

I have just provided the new BIG5 charmap for glibc-2.2. For your questions
there are my personal opinions :-)

:   - CP950
:      an extension of Big5, adds F9D6..F9FE
:
:   - Big5 ETen
:      an extension of CP950, adds C6A1..C7FE
:
:   - Big5 HKSCS
:      an extension of CP950, adds 8840..A0FE, C6A1..C8FE, FA40..FEFE
:
:   - Big5+
:      an extension of CP950, adds XX{80..A0}, 8140..A0FE, A3E1..A3FE,
:      C680..C8FE, FA40..FEA0,
:      not supported by any OS
:
:   - the Arphic font repertoire
:      according to the comment in localedata/charmaps/BIG5,
:      an extension of CP950, adds C6A1..C7FE
:
: Problem 1: Big5 does not contain F9D6..F9FE but all other variants are 
: extensions of CP950.
: 
: What can we do?
:   (A) Implement BIG5 strictly as in the 1984 standard, and CP950 as a
:       different encoding. Provide an additional locale zh_TW.CP950.  
:   (B) Implement CP950 under the name "BIG5".
:   (C) Ask Simon Lin to reissue a new version of BIG5, including F9D6..F9FE.
: 
: I'm in favour of (A)+(C). What do you think?

My major consideration is to provide a most completed Big5 character
set which could fit the requirement of people in Taiwan. From this
point, the original BIG5 (1984) is absolutely not enough, because some
of the characters extension of CP950 are very frequently used. Therefore,
in Taiwan CP950 is really needed.

But what about the Big5 ETen extension? Well, this extension contains
many "special" symbols which are in fact rarely used. However, some
traditional Big5 software might still use them, so I also included them.
This is why the comment you read in localedata/charmaps/BIG5. This should
contain all the characters people will use in Taiwan. Therefore, I personally
suggest that when we call the term "Big5", which could mean the Big5 ETen
extension. We really don't need other imcompleted variants like Big5 (1984).

For Big5 HKSCS, I think only people in Hong-Kong will use it. So it should 
be saparated from Big5. In fact, programmers in Hong-Kong have provided
their Big5HKSCS charmap for glibc-2.2.

For Big5+, as you said, it is not supported by any OS, and further, it is 
not used by any people as I know. So we don't have to consider it.


: Mapping them like unicode.org's BIG5.TXT table seems wrong to me, because
: the way they map it is different from what all other tables do, and they
: say "there is some uncertainty about the mappings in the range C6A1 - C8FE
: ... The correct mappings these ranges need to be determined."
:       
: Mapping them like Arphic's font displays it seems wrong to me, because
: other people use different fonts, and therefore using their mapping (moreover
: to private area codes!) will cause interoperability problems. The right
: place for a font specific mapping table is XFree86's xc/extras/X-TrueType
: directory, not the glibc charmap.
: 
: Similarly for Big5 ETen because this is not an RFC backed exchange format.
: 
: Therefore I propose to remove the C6A1..C7FE area from the charmap and
: converter.

First, I think BIG5.TXT should be obsolete. Therefore I don't want to take
it as a reference when discussion.

The C6A1..C7FE are provided here only for backward compatibility. In fact
they are really rarely used. According to the explaination from Arphic Inc.,
they said that they have mapped these characters to the "user private area"
of the Unicode encoding, in this area the presentation might be different
between different countries. But I am not familiar with the Unicode, so
I don't know if this is suitable or not. If this mapping really causes
problems, I agree that we might just remove them from the charmap and
converter.


: Problem 3: The BIG5.TXT and CP950.TXT conversion tables don't cover the
: range A3C0..A3E0.
: 
: I propose to map them to U+2400..U+241F,U+2421. This is also what Big5+ does.

Because no one in Taiwan uses Big5+, so if you want to add this, I propose
to add a new "BIG5+" charmap and converter, to distringwish it from the 
"Big5".


: Problem 4: Mapping of A3E1 to the Euro sign.
: 
: Which standard has this? Which font has this? What do other BIG5
: implementations do with the Euro sign U+20AC?
: 
: I propose to remove it.

This is the way of CP950 mapping. I just adopted it. I have no further
opinions on this. So you can consider to remove it or not.


: Problem 5: Why are F9E9..F9EB, F9F9..F9FD commented out from the charmap?
: 
: I propose to add them back (to the CP950 charmap). If they don't convert
: unambiguously, use the %IRREVERSIBLE% notation as in the EUC-JP charmap.
: If they don't display well in Arphic, then please fix the Arphic fonts.
:
: Problem 6: Mapping of A2CC, A2CE etc.
: 
: I propose to add them back, using the %IRREVERSIBLE% notation.

Because I just consider that in charmap, the Big5 and Unicode should be
strictly one-to-one mapping. For those characters you listed above, there
are 2 Big5 - to - 1 Unicode mapping and which might not be allowed. So
I commented them out for safety. If you think adding the %IRREVERSIBLE%
notation is acceptable, I would be glad to put them back.


Thank you very much for your opinions :-))


T.H.Hsieh

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]