This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: mbstrings



Background for latecomers:

   Guile would like to support character sets containing more than 256
   characters, like Unicode, or the JIS sets.

   To this end, Guile has "multi-byte" string and symbol types, which are
   just like their ordinary counterparts, but use a special encoding for
   representing wide characters.  In this encoding, a single character
   may be represented by more than one byte in the string.  By default,
   Guile multi-byte strings use an encoding found nowhere else ---
   neither Unicode UTF-8 nor MULE.  Yay.

   I/O ports can be either "regular", meaning they operate on byte
   streams, "multi-byte", meaning they use the encoding mentioned above,
   or "wide char", meaning they're actually streams of 16-bit characters.
   Guile does all nine possible conversions (three kinds of ports
   vs. three kinds of strings) appropriately on output, and creates the
   three kinds of strings on input.  Or claims to, anyway.)


As far as I'm concerned, all the multi-representation stuff can go.

The multi-rep stuff is difficult to use.  Programmers tend not to put
much effort into cases that aren't useful to them, so I think the
majority of contributed Guile code will not actually handle anything
but ordinary strings --- after all, Latin-1 is enough for a lot of
people.  So over time, the body of available Guile code will come to
contain a lot of stuff that isn't multi-rep friendly.  Thus, when some
poor end user wants to actually use her native character set, she'll
find it doesn't work.  Even if the Guile core is pure and clean,
she'll probably be relying on some module or other, maybe without even
knowing it.  And that module might not be multi-rep clean.  So the
overall effect is that alternative reps won't work for her.

Said another way, the multi-rep code is unusually susceptible to
bit-rot.

Corollary: Guile's multi-byte/multi-representation stuff is
hard-to-use internationalization support; therefore, it effectively
doesn't exist.  Therefore, we might as well remove it.


I would very much like Guile to support a wide character set.  But I
think the multi-rep stuff should go.