This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: printing wchar_t*


On Friday 14 April 2006 12:43, Eli Zaretskii wrote:
> > From:  Vladimir Prus <ghost@cs.msu.su>
> > Date:  Fri, 14 Apr 2006 10:10:19 +0400
> >
> > The problem is that I don't see any way how gdb can print wchar_t in a
> > way that does not require post-processing. It can print it as UTF8, but
> > then for printing char* gdb should use local 8 bit encoding, which is
> > likely to be *not* UTF8.
>
> You are talking about a GUI front-end, aren't you?  In that case, you
> will need to code a routine that accepts a wchar_t string, and then
> _displays_ it using the appropriate font.  It is wrong to talk about
> ``printing'' it and about ``local 8-bit encoding'', because you don't
> want to encode it, you want to display it using the appropriate font.
>
> In particular, if the original wchar_t uses Unicode codepoints, then
> presumably there should be some GUI API call, specific to your
> windowing system, that would accept such a wchar_t string and display
> it using a Unicode font.

Sure, I know how to display Unicode string. The question is how to get at pass 
raw Unicode data from gdb to frontend in the form suitable for me and most 
reasonable to other users of gdb. As I said, I already have a user-defined 
command to do this, but it won't benefit other users of gdb.

> So if you are going to do this in the front-end, I think all you need
> is ask GDB to supply the wchar_t string using the array notation; the
> rest will have to be done inside the front-end.  Am I missing
> something?

Yes, I'll need to know the length of the string. I can do this either using 
user-defined gdb command (which again will solve *my* problem, but be a local 
solution), or by looking at each character until I see zero, in which case 
I'd need to command for each characters.

>
> > Gdb can probably use some extra markers for values: like:
> >
> >    "foo"  for string in local 8-bit encoding
> >    L"foo" for string in UTF8 encoding.
> >
> > It's also possible to use "\u" escapes.
>
> Why do you need any of these?  16-bit Unicode characters are just
> integers, so ask GDB to send them as integers.  That should be all you
> need, since displaying them is something your FE will need to do
> itself, no?

In an original post, I've asked if gdb can print wchar_t just as a raw 
sequence of values, like this:

    0x56, 0x1456

"foo" and L"foo" are other alternatives which might be more handy for general 
users of gdb.

> > But then there's a problem:
> >
> >    - Do we assume that wchar_t is always UTF-16 or UTF-32?
>
> You don't need to assume, you can ask the application.  Wouldn't
> "sizeof(wchar_t)" do the trick?

Deciding if it's UTF-16 or UTF-32 is not the problem. In fact, exactly the 
same code will handle both encodings just fine. The question if we allow 
encodings which are not UTF-16 or UTF-32. I don't know about any such 
encodings, but I'm not an i18n expert.

> >      - how user-specified encoding will be handled
>
> wchar_t is not an encoding, it's the characters' codes themselves.

I don't understand what you say here, sorry. Do you mean that each wchar_t is 
in general code point, not a complete abstract character. Yes, true, and 
what? If wchar_t* literals can use encoding other then UTF-16 and UTF-32, you 
need the code to handle that encoding, and the question arises where you'll 
get that code, will it be iconv or something else.

> Encoded characters are (in general) multibyte character strings, not
> wchar_t.  See, for example, the description of library functions
> mbsinit, mbrlen, mbrtowc, etc., for more about this distinction.

I know about this distinction.

- Volodya




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]