This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Default target wide character set



Hello, all.


I'm Alexey Feldgendler, a developer in a software company that uses gdb to debug on *nix systems. I got assigned part-time to contribute to gdb, mostly by fixing bugs that affect us, but also to implement new features. Because I'm new to gdb internals, I'm trying to be very careful about making anything but trivial fixes like <> for now, so I'd like to discuss with you something I'm trying to achieve. Thank you in advance for your time.

Currently, UCS-4 is the default target wide character set. However, what this setting is really used for is handling of wide character strings, i.e. sequences of wide characters, which in C[++] are represented by the type wchar_t. By default, gcc indeed considers wchar_t 4 bytes wide, but it has an option to make wchar_t 2 bytes wide (-fshort-wchar). When this option is used, the default setting for the target wide character set becomes wrong.

Looking at charset_for_string_type(), it seems to handle C_STRING_16 and C_STRING_32 sort of correctly based on the character width. However, for C_WIDE_STRING it simply uses target_wide_charset(), no matter whether it's reasonable or not.

I have two alternative ideas for how to tackle this problem.

A. Have the default target wide character set depend on the size of the type named wchar_t. If I understand it correctly, in this case the default needs to be updated then when the symbol table gets loaded. Of course, any user-specified value should override the computed default. There should also be some way to reset the option to its dynamic default.

Side question: how does gdb figure out sizeof(wchar_t)? Does it come from the symbol table or from elsewhere?

B. Have charset_for_string_type() check after calling target_wide_charset() whether the width of the returned character set matches the width of the actual string type, and use fallback similar to what's done for C_STRING_16 and C_STRING_32 if it doesn't. By width of the character set I mean the smallest possible width of a character in it, that would be e.g. 1 for UTF-8 and 2 for UCS-2. In this case, what âshow charsetâ shows sometimes won't match what's actually used to print a wchar_t[] string.

What do you think of options A and B? Or is there maybe another possiblity that I'm overlooking?


-- Alexey Feldgendler <alexeyf@opera.com> Software Developer, Desktop Platform/Delivery Team, Opera Software ASA [ICQ: 115226275] http://my.opera.com/feldgendler/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]