This is the mail archive of the gdb@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: suggestion for dictionary representation


David Carlton <carlton@math.stanford.edu> writes:
> > I'm tempted to whack the block special case for function arguments.  It
> > may make name lookup a little more complicated but I think it will make
> > everything clearer.  We could, of course, try this on the branch and
> > see if we like the results :)
> 
> Would it be reasonable to break up function blocks into two separate
> blocks: a linear block that only defines the parameters for the
> function and a non-linear block that contains the actual local
> variables?  Not that I think Jim's scheme is a bad one - I agree that
> it's better than the current scheme - but given the possibility of
> local variables shadowing function parameters, it seems to me to be
> conceptually cleaner to have two separate blocks appear anyways, and
> it also solves this problem.

The issue is a bit more tangled than you think, I think.  Splitting
the function's body and its formals into two separate blocks is a good
idea, but it isn't going to get rid of all your duplicates.  A single
formal parameter can have two symbols in a function's block that
describe it.  Try this out on a Pentium.  (The `-O2' and `-gstabs+'
are required.)

  $ cat func.c
  #include <stdio.h>

  int
  main (int argc, char **argv)
  {
    static int local = 3;
    printf ("%d\n", argc * local);
  }
  $ gcc -O2 -gstabs+ func.c -o func

Then start up GDB on GDB on `func':

  (top-gdb) run
  The program being debugged has been started already.
  Start it from the beginning? (y or n) y

  Starting program: gdb -nw func
  GNU gdb 2002-09-16-cvs
  Copyright 2002 Free Software Foundation, Inc.
  GDB is free software, covered by the GNU General Public License, and you are
  welcome to change it and/or distribute copies of it under certain conditions.
  Type "show copying" to see the conditions.
  There is absolutely no warranty for GDB.  Type "show warranty" for details.
  This GDB was configured as "i686-pc-linux-gnu"...
  (gdb)

Set a breakpoint in main, just to get the symbols read:

  (gdb) break main
  Breakpoint 1 at 0x804834c: file func.c, line 7.
  (gdb)

Drop out to the enclosing GDB:

  (gdb) info
  (top-gdb)

It just so happens that `func.c' is the first compilation unit of the
first executable file in GDB's list:

  (top-gdb) print object_files->symtabs->filename
  $177 = 0x82fdbf8 "func.c"
  (top-gdb)

If that's not so for you, you'll need to walk `symtabs' to find the
right symtab.  Anyway, let's check out this symtab's blockvector.  I'm
just using [0] as a postfix dereferencing operator here:

  (top-gdb) print object_files->symtabs->blockvector[0]
  $178 = {nblocks = 3, block = {0x82f8b74}}
  (top-gdb)

The first and second blocks are the global and static blocks, so the
third one is probably for `main':

  (top-gdb) print object_files->symtabs->blockvector->block[2]
  $179 = (struct block *) 0x82f8ab4
  (top-gdb) p *$179
  $180 = {startaddr = 134513472, endaddr = 134513513, function = 0x82f8988, 
    superblock = 0x82f8ae4, gcc_compile_flag = 2 '\002', hashtable = 0 '\0', 
    nsyms = 4, sym = {0x82f89c4}}
  (top-gdb) p *$179->function
  $181 = {ginfo = {name = 0x82f89bc "main", value = {ivalue = 137333428, 
        block = 0x82f8ab4, 
        bytes = 0x82f8ab4 "@\203\004\bi\203\004\b\210\211/\bä\212/\b\002", 
        address = 137333428, chain = 0x82f8ab4}, language_specific = {
        cplus_specific = {demangled_name = 0x0}}, language = language_c, 
      section = 11, bfd_section = 0x82d4fc0}, type = 0x82faaa8, 
    namespace = VAR_NAMESPACE, aclass = LOC_BLOCK, line = 5, aux_value = {
      basereg = 0}, aliases = 0x0, ranges = 0x0, hash_next = 0x0}
  (top-gdb)

And it was!  Let's look at those four symbols:

  (top-gdb) p *$179->sym[0]
  $182 = {ginfo = {name = 0x82f89f8 "argc", value = {ivalue = 8, block = 0x8, 
        bytes = 0x8 <Address 0x8 out of bounds>, address = 8, chain = 0x8}, 
      language_specific = {cplus_specific = {demangled_name = 0x0}}, 
      language = language_c, section = 0, bfd_section = 0x0}, type = 0x82df828,
    namespace = VAR_NAMESPACE, aclass = LOC_ARG, line = 4, aux_value = {
      basereg = 0}, aliases = 0x0, ranges = 0x0, hash_next = 0x0}
  (top-gdb) p *$179->sym[1]
  $183 = {ginfo = {name = 0x82f8a34 "argv", value = {ivalue = 12, block = 0xc, 
        bytes = 0xc <Address 0xc out of bounds>, address = 12, chain = 0xc}, 
      language_specific = {cplus_specific = {demangled_name = 0x0}}, 
      language = language_c, section = 0, bfd_section = 0x0}, type = 0x82faaf4,
    namespace = VAR_NAMESPACE, aclass = LOC_ARG, line = 4, aux_value = {
      basereg = 0}, aliases = 0x0, ranges = 0x0, hash_next = 0x0}
  (top-gdb) p *$179->sym[2]
  $184 = {ginfo = {name = 0x82f8a70 "argc", value = {ivalue = 0, block = 0x0, 
        bytes = 0x0, address = 0, chain = 0x0}, language_specific = {
        cplus_specific = {demangled_name = 0x0}}, language = language_c, 
      section = 0, bfd_section = 0x0}, type = 0x82df828, 
    namespace = VAR_NAMESPACE, aclass = LOC_REGISTER, line = 4, aux_value = {
      basereg = 0}, aliases = 0x0, ranges = 0x0, hash_next = 0x0}
  (top-gdb) p *$179->sym[3]
  $185 = {ginfo = {name = 0x82f8aac "local", value = {ivalue = 134517720, 
        block = 0x80493d8, bytes = 0x80493d8 "É\f", address = 134517720, 
        chain = 0x80493d8}, language_specific = {cplus_specific = {
          demangled_name = 0x0}}, language = language_c, section = 14, 
      bfd_section = 0x0}, type = 0x82df828, namespace = VAR_NAMESPACE, 
    aclass = LOC_STATIC, line = 6, aux_value = {basereg = 0}, aliases = 0x0, 
    ranges = 0x0, hash_next = 0x0}
  (top-gdb) 

Hey!  Why are there two entries for argc?  (This is the extra tangle I
was referring to.  If you know all about this, you can stop reading
now.)

The two `argc' symbols have different address classes: one has an
address class that indicates it's an argument, and the other doesn't.
The argument symbol describes where the variable is passed on the
stack (eight bytes after %ebp), whereas the non-argument symbol
describes where the variable lives in the block of the function:
register zero, or %eax.

As a sanity check, let's look at the IA-32 code for main:

    (top-gdb) c
    Continuing.
    (gdb) disass main
    Dump of assembler code for function main:
    0x8048340 <main>:	push   %ebp
    0x8048341 <main+1>:	mov    %esp,%ebp
    0x8048343 <main+3>:	sub    $0x8,%esp
    0x8048346 <main+6>:	mov    0x8(%ebp),%eax
    0x8048349 <main+9>:	and    $0xfffffff0,%esp
    0x804834c <main+12>:	mov    0x80493d8,%edx
    0x8048352 <main+18>:	movl   $0x80483c8,(%esp,1)
    0x8048359 <main+25>:	imul   %edx,%eax
    0x804835c <main+28>:	mov    %eax,0x4(%esp,1)
    0x8048360 <main+32>:	call   0x8048268 <printf>
    0x8048365 <main+37>:	mov    %ebp,%esp
    0x8048367 <main+39>:	pop    %ebp
    0x8048368 <main+40>:	ret    
    End of assembler dump.
    (gdb) 

So, yes, the compiler did copy `argc' from the stack into %eax.
Check.

But *why* does GDB do this?  I have no idea.  It seems to me that,
with prologue skipping et al, simply having a single LOC_REGPARM would
be the Right Thing.  I don't really know when GDB will prefer the
argument entry, and when it'll prefer the non-argument entry.

I suspect it's historical.  If you look at the stabs spec, you'll see
that it actually emits two stabs for arguments that are passed in one
place, but get moved somewhere else:

  $ objdump --stabs func
  ...
  329    FUN    0      5      08048340 12145  main:F(0,1)
  330    PSYM   0      4      00000008 12157  argc:p(0,1)
  331    PSYM   0      4      0000000c 12169  argv:p(1,1)=*(7,36)
  332    SLINE  0      5      00000000 0      
  333    SLINE  0      7      0000000c 0      
  334    SLINE  0      8      00000025 0      
  335    RSYM   0      4      00000000 12189  argc:r(0,1)
  336    STSYM  0      6      080493d8 12201  local:V(0,1)
  337    LBRAC  0      0      0000000c 0      
  338    RBRAC  0      0      00000029 0      
  339    FUN    0      0      00000029 0      
  ...
  $ 

The PSYM accounts for the argument symbol, and the RSYM accounts for
the internal symbol.  A lot of GDB's data structures very closely
match what's provided in STABS.  (The partial symbol tables are a
good example of this: they correspond exactly to the EXCL links.)

But anyway, all this could be handled much better nowadays using Dwarf
2 CFA and location lists.  I've been saying that for years, but it
hasn't happened yet.  Andrew has the CFI done now (I think?), and
Daniel B. has submitted a patch for location expressions (but not
location lists, tho they would be easy to add), but it's awaiting
revision while he works on law school.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]