This is the mail archive of the guile@cygnus.com mailing list for the guile project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: mbstrings


Jim> Stallman wants Guile to use MULE's encoding, which I think is
Jim> horrible.

Gross.

Does he have some real objection to Unicode?  Or is this just an
arbitrary and quixotic decision?

Jim> There's no nice way to deal with variable-length characters.
Jim> string-length and string-ref are just hopeless, if you want to
Jim> preserve the properties promised by R4RS.

I only have R3.99RS here, so I don't know what properties are
guaranteed.  But many string functions are losers with a multibyte
encoding.  For instance string-set! can resize the string's buffer.

Jim> Wide characters are better, but Stallman is opposed to them.  He
Jim> feels they will waste memory (not a good argument --- sometimes
Jim> memory is worth wasting) and will make interoperation with
Jim> ordinary C code harder (this is a good point).

C interoperability is indeed a problem.  For instance many functions
in posix.c would need to do a wide->multibyte conversion on their
arguments.  There are lots of examples of this, even mildly suprising
ones.  For instance the gettext library currently assumes C strings;
either a conversion would be necessary here (doesn't that seem
losing?), or gettext would need some hacking.  One could safely say
that this will be the case with basically every library you'd want to
interface with Guile.  So the choice of wide-vs-multibye would depend
in part on how often you think string operations happen in comparison
to interactions with other libraries.

I get the feeling that encoding choice has many dimensions of
complexity.  I also get the feeling that I don't know what all of them
are.


What's the word on excising mbstrings.[ch]?

Tom
-- 
tromey@cygnus.com                 Member, League for Programming Freedom