This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Character set support in GDB


Jim Blandy has added character set support to GDB.  On his behalf, I
have posted a patch to gdb-patches:

    http://sources.redhat.com/ml/gdb-patches/2002-09/msg00228.html

There has already been some discussion regarding this patch.  Below is
the documentation that Jim has written for this new support.  (You'll
find this same documentation in the patch that I posted, but IMO, the
texinfo markup makes it somewhat harder to read.)

     _________________________________________________________________

8.15 Character Sets

   If  the  program  you  are debugging uses a different character set to
   represent characters and strings than the one GDB uses itself, GDB can
   automatically  translate  between  the  character  sets  for  you. The
   character  set  GDB  uses  we call the host character set; the one the
   inferior program uses we call the target character set.
   
   For  example, if you are running GDB on a Linux system, which uses the
   ISO  Latin  1  character  set, but you are using GDB's remote protocol
   (see  section  Remote  Debugging) to debug a program running on an IBM
   mainframe,  which  uses  the  EBCDIC  character  set,  then  the  host
   character  set  is Latin-1, and the target character set is EBCDIC. If
   you  give  GDB  the  command  set  target-charset  ebcdic-us, then GDB
   translates between EBCDIC and Latin 1 as you print character or string
   values, or use character and string literals in expressions.
   
   GDB  has  no  way  to  automatically recognize which character set the
   inferior  program uses; you must tell it, using the set target-charset
   command, described below.
     
   Here are the commands for controlling GDB's character set support:
   
   set target-charset charset
          Set  the  current  target character set to charset. We list the
          character set names GDB recognizes below, but if you invoke the
          set  target-charset  command  with  no  argument, GDB lists the
          character sets it supports.
          
   set host-charset charset
          Set the current host character set to charset.
          By  default,  GDB  uses a host character set appropriate to the
          system  it  is  running on; you can override that default using
          the set host-charset command.
          GDB  can  only use certain character sets as its host character
          set.  We list the character set names GDB recognizes below, and
          indicate  which  can  be host character sets, but if you invoke
          the  set  host-charset  command with no argument, GDB lists the
          character  sets  it  supports,  placing an asterisk (`*') after
          those it can use as a host character set.
   set charset charset
          Set  the  current host and target character sets to charset. If
          you  invoke  the set charset command with no argument, it lists
          the  character  sets  it  supports.  GDB  can  only use certain
          character sets as its host character set; it marks those in the
          list with an asterisk (`*').
   show charset
   show host-charset
   show target-charset
          Show   the   current   host   and  target  charsets.  The  show
          host-charset  and show target-charset commands are synonyms for
          show charset.
          
   GDB currently includes support for the following character sets:
     
   ASCII
          Seven-bit  U.S.  ASCII.  GDB can use this as its host character
          set.
   ISO-8859-1
          The ISO Latin 1 character set. This extends ASCII with accented
          characters  needed for French, German, and Spanish. GDB can use
          this as its host character set.
   EBCDIC-US
   IBM1047
          Variants  of  the  EBCDIC  character set, used on some of IBM's
          mainframe  operating  systems.  (Linux  on  the S/390 uses U.S.
          ASCII.) GDB cannot use these as its host character set.
          
   Here  is  an  example of GDB's character set support in action. Assume
   that   the   following  source  code  has  been  placed  in  the  file
   `charset-test.c':
   
     
#include <stdio.h>

char ascii_hello[]
  = {72, 101, 108, 108, 111, 44, 32, 119,
     111, 114, 108, 100, 33, 10, 0};
char ibm1047_hello[]
  = {200, 133, 147, 147, 150, 107, 64, 166,
     150, 153, 147, 132, 90, 37, 0};

main ()
{
  printf ("Hello, world!\n");
}      
   
   In  this  program, ascii_hello and ibm1047_hello are arrays containing
   the string `Hello, world!' followed by a newline, encoded in the ASCII
   and IBM1047 character sets.
     
   We compile the program, and invoke the debugger on it:
     
       
$ gcc -g charset-test.c -o charset-test
$ gdb -nw charset-test
GNU gdb 2001-12-19-cvs
Copyright 2001 Free Software Foundation, Inc.
...
(gdb)
     
   We  can use the show charset command to see what character sets GDB is
   currently using to interpret and display characters and strings:
     
   
(gdb) show charset
The current host and target character set is `iso-8859-1'.
(gdb)
     
   For  the  sake of printing this manual, let's use ASCII as our initial
   character set:
   
   
(gdb) set charset ascii
(gdb) show charset
The current host and target character set is `ascii'.
(gdb)

   Let's  assume  that  ASCII is indeed the correct character set for our
   host  system  --  in  other  words,  let's  assume  that if GDB prints
   characters  using  the  ASCII character set, our terminal will display
   them  properly.  Since our current target character set is also ASCII,
   the contents of ascii_hello print legibly:


(gdb) print ascii_hello
$1 = 0x401698 "Hello, world!\n"
(gdb) print ascii_hello[0]
$2 = 72 'H'
(gdb)

   GDB  uses  the  target character set for character and string literals
   you use in expressions:


(gdb) print '+'
$3 = 43 '+'
(gdb)

   The  ASCII  character  set  uses  the  number  43  to  encode  the `+'
   character.

   GDB  relies  on  the  user  to  tell it which character set the target
   program uses. If we print ibm1047_hello while our target character set
   is still ASCII, we get jibberish:


(gdb) print ibm1047_hello
$4 = 0x4016a8 "\310\205\223\223\226k@\246\226\231\223\204Z%"
(gdb) print ibm1047_hello[0]
$5 = 200 '\310'
(gdb)

   If  we  invoke the set target-charset command without an argument, GDB
   tells us the character sets it supports:


(gdb) set target-charset
Valid character sets are:
  ascii *
  iso-8859-1 *
  ebcdic-us
  ibm1047
* - can be used as a host character set

   We  can  select  IBM1047  as our target character set, and examine the
   program's  strings  again.  Now  the  ASCII  string  is wrong, but GDB
   translates  the  contents  of  ibm1047_hello from the target character
   set,  IBM1047,  to  the  host  character  set, ASCII, and they display
   correctly:


(gdb) set target-charset ibm1047
(gdb) show charset
The current host character set is `ascii'.
The current target character set is `ibm1047'.
(gdb) print ascii_hello
$6 = 0x401698 "\110\145%%?\054\040\167?\162%\144\041\012"
(gdb) print ascii_hello[0]
$7 = 72 '\110'
(gdb) print ibm1047_hello
$8 = 0x4016a8 "Hello, world!\n"
(gdb) print ibm1047_hello[0]
$9 = 200 'H'
(gdb)

   As  above,  GDB uses the target character set for character and string
   literals you use in expressions:


(gdb) print '+'
$10 = 78 '+'
(gdb)

   The  IBM1047  character  set  uses  the  number  78  to encode the `+'
   character.
     _________________________________________________________________


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]