Two comments:
There's a lot of passing integers around to refer to a character.
That doesn't make a lot of sense to me; we should either be passing
char *, so that we can decode multibyte sequences, or using wchar_t
explicitly and autoconfing for it.
I see hardcoded support for a couple of simplistic charsets; would it
be worthwhile to add (minimal!) support for UTF-8 in case iconv is not
available? Gcj is natively UTF-8, and I have some open Debian bug
reports about this.
Absolutely --- as I say in the comments to charset.c:
At the moment, GDB only supports single-byte, stateless character
sets. This includes the ISO-8859 family (ASCII extended with
accented characters, and (I think) Cyrillic, for European
languages), and the EBCDIC family (used on IBM's mainframes).
Unfortunately, it excludes many Asian scripts, the fixed- and
variable-width Unicode encodings, and other desireable things.
Patches are welcome! (For example, it would be nice if the Java
string support could simply get absorbed into some more general
multi-byte encoding support.)
But it seemed to me that supporting stateless variable-width encodings
was going to be a *lot* of work. Specifically, how the printing code
should change was a bit beyond me.
Regarding `int' vs. `wchar_t': the wchar_t we could detect with
autoconf is a host type. It has no necessary relationship to the
`wchar_t' on the target. LONGEST might be a better choice than `int',
but `wchar_t' is worse.