This is the mail archive of the guile@cygnus.com mailing list for the guile project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Jim> Stallman wants Guile to use MULE's encoding, which I think is Jim> horrible. Gross. Does he have some real objection to Unicode? Or is this just an arbitrary and quixotic decision? Jim> There's no nice way to deal with variable-length characters. Jim> string-length and string-ref are just hopeless, if you want to Jim> preserve the properties promised by R4RS. I only have R3.99RS here, so I don't know what properties are guaranteed. But many string functions are losers with a multibyte encoding. For instance string-set! can resize the string's buffer. Jim> Wide characters are better, but Stallman is opposed to them. He Jim> feels they will waste memory (not a good argument --- sometimes Jim> memory is worth wasting) and will make interoperation with Jim> ordinary C code harder (this is a good point). C interoperability is indeed a problem. For instance many functions in posix.c would need to do a wide->multibyte conversion on their arguments. There are lots of examples of this, even mildly suprising ones. For instance the gettext library currently assumes C strings; either a conversion would be necessary here (doesn't that seem losing?), or gettext would need some hacking. One could safely say that this will be the case with basically every library you'd want to interface with Guile. So the choice of wide-vs-multibye would depend in part on how often you think string operations happen in comparison to interactions with other libraries. I get the feeling that encoding choice has many dimensions of complexity. I also get the feeling that I don't know what all of them are. What's the word on excising mbstrings.[ch]? Tom -- tromey@cygnus.com Member, League for Programming Freedom