This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
[RFC] PR 15873 UTF-8 incoplete/invalid chars go unnoticed
- From: "Pierre Muller" <pierre dot muller at ics-cnrs dot unistra dot fr>
- To: <gdb-patches at sourceware dot org>
- Date: Wed, 21 Aug 2013 16:55:05 +0200
- Subject: [RFC] PR 15873 UTF-8 incoplete/invalid chars go unnoticed
Not all binary values from 0 to 255
are valid UTF-8 chars:
http://fr.wikipedia.org/wiki/UTF-8
Seems to imply that any value between 128 and 255
cannot be a valid UTF-8 character....
Nonetheless,
testsuite/gdb.base/printcmds.exp
seems to rely on the fact that those values are just displayed as octals
in the print ctable1[XX] tests.
This test was failing a lot for mingw built GDB,
and while trying to understand why this was not the case on linux,
I noticed that the test was only completing successfully because
UTF-8 is the default target-charset on the linux system I tested.
The test itself seems to rely on the
set sevenbit-strings
command to ensure that all chars in 128-255 interval
are displayed as octals...
But the variable sevenbit_strings
is not handled at all in generic_emit_char,
which is called by the 'print ctable1[XX]' commands above.
The patch below adds a <invalid>/<incomplete> marker
to 1-byte chars that are not valid in UTF-8.
This means that this patch will create regressions
in testsuite runs, but I think that it's the
test that is wrong, not my patch.
Comments most welcomed,
Pierre Muller
GDB pascal language maintainer
2013-08-21 Pierre Muller <muller@sourceware.org>
* valprint.c (generic_emit_char): Handle RESULT value
and display information if problem occured inside wchar_iterate
call.
Index: src/gdb/valprint.c
===================================================================
RCS file: /cvs/src/src/gdb/valprint.c,v
retrieving revision 1.138
diff -u -p -r1.138 valprint.c
--- src/gdb/valprint.c 17 Jul 2013 20:35:11 -0000 1.138
+++ src/gdb/valprint.c 21 Aug 2013 14:38:49 -0000
@@ -2012,6 +2012,7 @@ generic_emit_char (int c, struct type *t
struct cleanup *cleanups;
gdb_byte *buf;
struct wchar_iterator *iter;
+ char *info = NULL;
int need_escape = 0;
buf = alloca (TYPE_LENGTH (type));
@@ -2035,6 +2036,23 @@ generic_emit_char (int c, struct type *t
enum wchar_iterate_result result;
num_chars = wchar_iterate (iter, &result, &chars, &buf, &buflen);
+ switch (result)
+ {
+ case wchar_iterate_ok:
+ /* Do not change it if it has been set before. */
+ break;
+ case wchar_iterate_invalid:
+ info = "<invalid>";
+ break;
+ case wchar_iterate_incomplete:
+ info = "<incomplete>";
+ break;
+ case wchar_iterate_eof:
+ /* info = "<eof>"; This is expected as last call. */
+ break;
+ default:
+ info = "<inconsistent>";
+ }
if (num_chars < 0)
break;
if (num_chars > 0)
@@ -2081,6 +2099,9 @@ generic_emit_char (int c, struct type *t
fputs_filtered (obstack_base (&output), stream);
+ if (info)
+ fputs_filtered (info, stream);
+
do_cleanups (cleanups);
}