This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] PR 15873 UTF-8 incoplete/invalid chars go unnoticed


  Not all binary values from 0 to 255
are valid UTF-8 chars:

http://fr.wikipedia.org/wiki/UTF-8

Seems to imply that any value between 128 and 255
cannot be a valid UTF-8 character....

  Nonetheless, 
testsuite/gdb.base/printcmds.exp 
seems to rely on the fact that those values are just displayed as octals
in the print ctable1[XX] tests.

  This test was failing a lot for mingw built GDB,
and while trying to understand why this was not the case on linux,
I noticed that the test was only completing successfully because
UTF-8 is the default target-charset on the linux system I tested.

  The test itself seems to rely on the
set sevenbit-strings
command to ensure that all chars in 128-255 interval
are displayed as octals...
  But the variable sevenbit_strings
is not handled at all in generic_emit_char,
which is called by the 'print ctable1[XX]' commands above.

  The patch below adds a <invalid>/<incomplete> marker
to 1-byte chars that are not valid in UTF-8.
  
  This means that this patch will create regressions
in testsuite runs, but I think that it's the
test that is wrong, not my patch.

  Comments most welcomed,

Pierre Muller
GDB pascal language maintainer


2013-08-21  Pierre Muller  <muller@sourceware.org>

	* valprint.c (generic_emit_char): Handle RESULT value
	and display information if problem occured inside wchar_iterate
	call.

Index: src/gdb/valprint.c
===================================================================
RCS file: /cvs/src/src/gdb/valprint.c,v
retrieving revision 1.138
diff -u -p -r1.138 valprint.c
--- src/gdb/valprint.c	17 Jul 2013 20:35:11 -0000	1.138
+++ src/gdb/valprint.c	21 Aug 2013 14:38:49 -0000
@@ -2012,6 +2012,7 @@ generic_emit_char (int c, struct type *t
   struct cleanup *cleanups;
   gdb_byte *buf;
   struct wchar_iterator *iter;
+  char *info = NULL;
   int need_escape = 0;
 
   buf = alloca (TYPE_LENGTH (type));
@@ -2035,6 +2036,23 @@ generic_emit_char (int c, struct type *t
       enum wchar_iterate_result result;
 
       num_chars = wchar_iterate (iter, &result, &chars, &buf, &buflen);
+      switch (result)
+	{
+	  case wchar_iterate_ok:
+	    /* Do not change it if it has been set before.  */
+	    break;
+	  case wchar_iterate_invalid:
+	    info = "<invalid>";
+	    break;
+	  case wchar_iterate_incomplete:
+	    info = "<incomplete>";
+	    break;
+	  case wchar_iterate_eof:
+	    /* info = "<eof>";  This is expected as last call.  */
+	    break;
+	  default:
+	    info = "<inconsistent>";
+	}
       if (num_chars < 0)
 	break;
       if (num_chars > 0)
@@ -2081,6 +2099,9 @@ generic_emit_char (int c, struct type *t
 
   fputs_filtered (obstack_base (&output), stream);
 
+  if (info)
+    fputs_filtered (info, stream);
+
   do_cleanups (cleanups);
 }



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]