This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
> From: Jim Blandy <jimb@red-bean.com> > Date: Tue, 14 Oct 1997 16:37:18 -0400 > MULE, if I recall correctly, > allows you to switch between several 16-bit encodings. Mule uses variable length encoding internally, and is able to convert them into various encodings when doing i/o. Most character sets are distinguished by special byte called 'leading character'. Here's an excerpt of mule info (original is japanese) Type 1-1: ASCII character set Character codes between 0x00 to 0x7f. Saved literally. Type 1-2: 1 byte character sets other than ASCII Saved with leading character 'LC1'. (i.e. it takes 2 bytes per character) Type 1-3: Private one byte character set Saved with two leading character 'LCPRV1' 'LC12'. Type 2-3: Two byte character set Saved with leading character 'LC2'. (i.e. it takes 3 bytes per character) Type 2-4: Private two byte character set Saved with two leading character 'LCPRV2' 'LC22'. Type 3-4: Three byte character set Saved with leading character 'LC3'. (i.e. it takes 4 bytes per character) Type N: Arbitrary length character set (composite character set) Starts from 'LCCMP', and each byte is saved with leading character 'LCNn' I've heard the drawback of Unicode is it's not organized well for converting to/from existing character set. (Unicode depends on character shape, but existing Chinese character encodings and Japanese character encodings are completely different even they share a lot of same shape characters, that means you need big lookup table for conversion.) I'm not an expert in this field, though. Please correct if it's wrong. > I need to consult one more guru, but I'm leaning towards using 16-bit > characters everywhere internally, and providing convenient conversion > functions. Most applications just use one encoding in their own, so 16-bit representation is enough. But if you want to deal with multiple languages like Mule, and want it to be fixed length characters, maybe you need either 32-bit representation or using Unicode and take big conversion tables. -- Shiro KAWAI Square USA Inc. Honolulu Studio, R&D division #"The most important things are the hardest things to say" --- Stephen King