This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] Signed/unsigned character arrays are not strings


Daniel Jacobowitz <drow@false.org> writes:
> On Tue, Feb 27, 2007 at 11:11:32PM +0100, Mark Kettenis wrote:
>> Anyway, I'm in favour of restore the traditional gdb behaviour of
>> printing all three as strings.
>
> While I haven't seen responses to some of my arguments in favor of the
> new behavior, it's obvious that I'm in the minority here.  So, the
> behavior ought to change.
>
> I was concerned about Jim's patch, but thinking about it some more, I
> would be happy with it (probably with Paul's suggestion).  That's
> because "typedef char *name" is reasonably common, but "typedef char
> name_unit;" is much less so.  What do those who prefer the old
> behavior think of that?

For whatever it's worth, here's a revised patch that implements Paul's
suggestions and has better comments (more accurately apologetic):

gdb/ChangeLog:
2007-02-27  Jim Blandy  <jimb@codesourcery.com>

	* c-valprint.c (textual_element_type): New function.
	(c_val_print): Use textual_element_type to decide whether to print
	arrays, pointer types, and integer types as strings and characters.
	(c_value_print): Doc fix.

Index: gdb/c-valprint.c
===================================================================
RCS file: /cvs/src/src/gdb/c-valprint.c,v
retrieving revision 1.42
diff -u -r1.42 c-valprint.c
--- gdb/c-valprint.c	26 Jan 2007 20:54:16 -0000	1.42
+++ gdb/c-valprint.c	28 Feb 2007 23:09:29 -0000
@@ -56,6 +56,66 @@
 }
 
 
+/* Apply a heuristic to decide whether an array of TYPE or a pointer
+   to TYPE should be printed as a textual string.  Return non-zero if
+   it should, or zero if it should be treated as an array of /pointer
+   to integers.
+
+   It's a shame that we need to use a heuristic here, but C hasn't
+   historically provided distinct types for bytes and characters; you
+   get 'char', which is guaranteed to occupy a byte.  You can put
+   qualifiers on it, but people use the qualified char types for text
+   too, so the qualifier's presence doesn't really tell you anything.
+
+   What's especially a shame here is that the heuristic is fixed.  The
+   user knows which of their types are textual and which are numeric,
+   but we don't give them a way to tell us, beyond using '/s' every time
+   they print something.  */
+static int
+textual_element_type (struct type *type)
+{
+  /* GDB doesn't use TYPE_CODE_CHAR for the C 'char' types; instead,
+     it uses one-byte TYPE_CODE_INT types, with TYPE_NAMEs like
+     "char", "unsigned char", etc. and appropriate flags.  For various
+     reasons, this works out well in some places.  */
+
+
+  struct type *true_type = check_typedef (type);
+
+  /* TYPE_CODE_CHAR is always textual.  But I don't think it ever
+     occurs in C code.  */
+  if (TYPE_CODE (true_type) == TYPE_CODE_CHAR)
+    return 1;
+
+  /* If a one-byte TYPE_CODE_INT has a TYPE_NAME of "char" or
+     something ending with " char", then we treat it as text;
+     otherwise, we assume it's being used as data.  This makes all our
+     SIMD types like builtin_type_v8_int8 and the <stdint.h> types
+     like uint8_t print numerically, but all 'char' types print
+     textually.  Code which says what it means does well.
+     We also recognize types ending in 'char_t'.  */
+  if (TYPE_CODE (true_type) == TYPE_CODE_INT
+      && TYPE_LENGTH (true_type) == 1)
+    {
+      int name_len;
+      
+      /* All integer types should have names.  */
+      gdb_assert (TYPE_NAME (type));
+
+      name_len = strlen (TYPE_NAME (type));
+
+      if (strcmp (TYPE_NAME (type), "char") == 0
+          || (name_len > 5
+              && strcmp (TYPE_NAME (type) + name_len - 5, " char") == 0)
+          || (name_len >= 6
+              && strcmp (TYPE_NAME (type) + name_len - 6, " char_t") == 0))
+        return 1;
+    }
+
+  return 0;
+}
+
+
 /* Print data of type TYPE located at VALADDR (within GDB), which came from
    the inferior at address ADDRESS, onto stdio stream STREAM according to
    FORMAT (a letter or 0 for natural format).  The data at VALADDR is in
@@ -94,11 +154,9 @@
 	    {
 	      print_spaces_filtered (2 + 2 * recurse, stream);
 	    }
-	  /* For an array of chars, print with string syntax.  */
-	  if (eltlen == 1 &&
-	      ((TYPE_CODE (elttype) == TYPE_CODE_INT && TYPE_NOSIGN (elttype))
-	       || ((current_language->la_language == language_m2)
-		   && (TYPE_CODE (elttype) == TYPE_CODE_CHAR)))
+
+	  /* Print arrays of textual chars with a string syntax.  */
+          if (textual_element_type (TYPE_TARGET_TYPE (type))
 	      && (format == 0 || format == 's'))
 	    {
 	      /* If requested, look for the first null char and only print
@@ -191,12 +249,11 @@
 	      deprecated_print_address_numeric (addr, 1, stream);
 	    }
 
-	  /* For a pointer to char or unsigned char, also print the string
+	  /* For a pointer to a textual type, also print the string
 	     pointed to, unless pointer is null.  */
 	  /* FIXME: need to handle wchar_t here... */
 
-	  if (TYPE_LENGTH (elttype) == 1
-	      && TYPE_CODE (elttype) == TYPE_CODE_INT
+	  if (textual_element_type (TYPE_TARGET_TYPE (type))
 	      && (format == 0 || format == 's')
 	      && addr != 0)
 	    {
@@ -398,7 +455,7 @@
 	     Since we don't know whether the value is really intended to
 	     be used as an integer or a character, print the character
 	     equivalent as well. */
-	  if (TYPE_LENGTH (type) == 1)
+	  if (textual_element_type (type))
 	    {
 	      fputs_filtered (" ", stream);
 	      LA_PRINT_CHAR ((unsigned char) unpack_long (type, valaddr + embedded_offset),
@@ -500,7 +557,9 @@
       || TYPE_CODE (type) == TYPE_CODE_REF)
     {
       /* Hack:  remove (char *) for char strings.  Their
-         type is indicated by the quoted string anyway. */
+         type is indicated by the quoted string anyway.
+         (Don't use textual_element_type here; quoted strings
+         are always exactly (char *).  */
       if (TYPE_CODE (type) == TYPE_CODE_PTR
 	  && TYPE_NAME (type) == NULL
 	  && TYPE_NAME (TYPE_TARGET_TYPE (type)) != NULL


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]