This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: printing wchar_t*

From: "Jim Blandy" <jimb at red-bean dot com>
To: "Eli Zaretskii" <eliz at gnu dot org>
Cc: "Vladimir Prus" <ghost at cs dot msu dot su>, gdb at sources dot redhat dot com
Date: Fri, 14 Apr 2006 10:53:44 -0700
Subject: Re: printing wchar_t*
References: <e1lsqg$aml$1@sea.gmane.org> <200604141257.41690.ghost@cs.msu.su> <uu08w1cnf.fsf@gnu.org> <200604141837.26618.ghost@cs.msu.su> <uirpc19u8.fsf@gnu.org>

I think folks are seeing difficult problems where there aren't any. 
Even if the host character set (that is, the character set GDB is
using to communicate with its user, or in its MI communications) is
plain, old ASCII, GDB can, without any loss of information, convey the
contents of a wide string using an arbitrary target character set via
MI to a GUI, using code the GUI must already have.

Suppose we have a wide string where wchar_t values are Unicode code
points.  Suppose our host character set is plain ASCII.  Suppose the
user's program has a string containing the digits '123', followed by
some funky Tibetan characters U+0F04 U+0FCC, followed by the letters
'xyz'.  When asked to print that string, GDB should print the
following twenty-one ASCII characters:

L"123\x0f04\x0fccxyz"

Since this is a valid way to write that string in a source program, a
user at the GDB command line should understand it.  Since consumers of
MI information must contain parsers for C values already, they can
reliably find the contents of the string.

Note that this gets a GUI the contents of the string in the *target*
character set.  The GUI itself should be responsible for converting
target characters to whatever character set it wants to use to present
data to its user.  Here, GDB's 'host' character set is just the
character set used to carry information from GDB to the GUI; it should
probably be set to ASCII, just to avoid needless variation.  But
either way, it's just acting as a medium for values in C source code
syntax, and has no bearing on either the character set the target
program is using, or the character set the GUI will use to present
data to its user.

Unicode technical report #17 lays out the terminology the Unicode
folks use for all this stuff, with good explanations:
http://www.unicode.org/reports/tr17/

According to the ISO C standard, the coding character set used by
wchar_t must be a superset of that used by char for members of the
basic character set.  See ISO/IEC 9899:1999 (E) section 7.17,
paragraph 2.  So I think it's sufficient for the user to specify the
coding character set used by wide characters; that fixes the ccs used
for char values.

Follow-Ups:
- Re: printing wchar_t*
  - From: Eli Zaretskii
- Re: printing wchar_t*
  - From: Mark Kettenis

References:
- printing wchar_t*
  - From: Vladimir Prus
- Re: printing wchar_t*
  - From: Vladimir Prus
- Re: printing wchar_t*
  - From: Eli Zaretskii
- Re: printing wchar_t*
  - From: Vladimir Prus
- Re: printing wchar_t*
  - From: Eli Zaretskii

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]