This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
>The answer can only be UCS4. It's no surprise that all reasonable >i18n developers (this excludes those at IBM) use a 32bit type for >wchar_t. *ugh* Can you be more specific about why 32 bits are needed? Which character sets does Unicode not accomodate? Or is that the wrong question for me to ask? >This may sound like a big waste of space but if used correctly it >isn't. Normally string are not meant to contain whole text books but >instead are rather short. This means there is not that much >redundancy. If you need to store large texts you can still fall back >on a multibyte encoding, perhaps offer several of them so that the >most effective can be chosen. This argument is not entirely reassuring to me. If one thinks mostly about processing text streams, sure, this is fine. However, I am more interested in interactive applications like Emacs, and related things with wider audiences. In such applications there are no clear boundaries at which it is convenient to convert between a dense form, like UTF-8, and a sparse but consistent form, like UCS2. An Emacs buffer must hold large amounts of text, and must also serve as the operand to editing and searching commands. It is terribly clumsy to use a variable-length encoding in buffers. Since the buffer representation must be the foundation of all other i18n support, it's important to get it right. Doubling the text storage required isn't so unreasonable; quadrupling it is.