This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH RFC] Character set support

From: Andrew Cagney <ac131313 at ges dot redhat dot com>
To: Jim Blandy <jimb at redhat dot com>
Cc: Daniel Jacobowitz <drow at mvista dot com>,Kevin Buettner <kevinb at redhat dot com>, gdb-patches at sources dot redhat dot com
Date: Thu, 12 Sep 2002 23:45:42 -0400
Subject: Re: [PATCH RFC] Character set support
References: <1020913003056.ZM15701@localhost.localdomain> <20020913004205.GB19479@nevyn.them.org> <vt2n0qmwpm9.fsf@zenia.red-bean.com>

Two comments:
There's a lot of passing integers around to refer to a character. That doesn't make a lot of sense to me; we should either be passing
char *, so that we can decode multibyte sequences, or using wchar_t
explicitly and autoconfing for it.

I see hardcoded support for a couple of simplistic charsets; would it
be worthwhile to add (minimal!) support for UTF-8 in case iconv is not
available? Gcj is natively UTF-8, and I have some open Debian bug
reports about this.
Absolutely --- as I say in the comments to charset.c:

   At the moment, GDB only supports single-byte, stateless character
   sets.  This includes the ISO-8859 family (ASCII extended with
   accented characters, and (I think) Cyrillic, for European
   languages), and the EBCDIC family (used on IBM's mainframes).
   Unfortunately, it excludes many Asian scripts, the fixed- and
   variable-width Unicode encodings, and other desireable things.
   Patches are welcome!  (For example, it would be nice if the Java
   string support could simply get absorbed into some more general
   multi-byte encoding support.)

I think this should be mentioned in the documentation.

Andrew

But it seemed to me that supporting stateless variable-width encodings
was going to be a *lot* of work.  Specifically, how the printing code
should change was a bit beyond me.

Regarding `int' vs. `wchar_t': the wchar_t we could detect with
autoconf is a host type.  It has no necessary relationship to the
`wchar_t' on the target.  LONGEST might be a better choice than `int',
but `wchar_t' is worse.

Follow-Ups:
- Re: [PATCH RFC] Character set support
  - From: Jim Blandy

References:
- [PATCH RFC] Character set support
  - From: Kevin Buettner
- Re: [PATCH RFC] Character set support
  - From: Daniel Jacobowitz
- Re: [PATCH RFC] Character set support
  - From: Jim Blandy

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]