This is the mail archive of the
gdb@sources.redhat.com
mailing list for the GDB project.
Re: suggestion for dictionary representation
David Carlton <carlton@math.stanford.edu> writes:
> > I'm tempted to whack the block special case for function arguments. It
> > may make name lookup a little more complicated but I think it will make
> > everything clearer. We could, of course, try this on the branch and
> > see if we like the results :)
>
> Would it be reasonable to break up function blocks into two separate
> blocks: a linear block that only defines the parameters for the
> function and a non-linear block that contains the actual local
> variables? Not that I think Jim's scheme is a bad one - I agree that
> it's better than the current scheme - but given the possibility of
> local variables shadowing function parameters, it seems to me to be
> conceptually cleaner to have two separate blocks appear anyways, and
> it also solves this problem.
The issue is a bit more tangled than you think, I think. Splitting
the function's body and its formals into two separate blocks is a good
idea, but it isn't going to get rid of all your duplicates. A single
formal parameter can have two symbols in a function's block that
describe it. Try this out on a Pentium. (The `-O2' and `-gstabs+'
are required.)
$ cat func.c
#include <stdio.h>
int
main (int argc, char **argv)
{
static int local = 3;
printf ("%d\n", argc * local);
}
$ gcc -O2 -gstabs+ func.c -o func
Then start up GDB on GDB on `func':
(top-gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: gdb -nw func
GNU gdb 2002-09-16-cvs
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu"...
(gdb)
Set a breakpoint in main, just to get the symbols read:
(gdb) break main
Breakpoint 1 at 0x804834c: file func.c, line 7.
(gdb)
Drop out to the enclosing GDB:
(gdb) info
(top-gdb)
It just so happens that `func.c' is the first compilation unit of the
first executable file in GDB's list:
(top-gdb) print object_files->symtabs->filename
$177 = 0x82fdbf8 "func.c"
(top-gdb)
If that's not so for you, you'll need to walk `symtabs' to find the
right symtab. Anyway, let's check out this symtab's blockvector. I'm
just using [0] as a postfix dereferencing operator here:
(top-gdb) print object_files->symtabs->blockvector[0]
$178 = {nblocks = 3, block = {0x82f8b74}}
(top-gdb)
The first and second blocks are the global and static blocks, so the
third one is probably for `main':
(top-gdb) print object_files->symtabs->blockvector->block[2]
$179 = (struct block *) 0x82f8ab4
(top-gdb) p *$179
$180 = {startaddr = 134513472, endaddr = 134513513, function = 0x82f8988,
superblock = 0x82f8ae4, gcc_compile_flag = 2 '\002', hashtable = 0 '\0',
nsyms = 4, sym = {0x82f89c4}}
(top-gdb) p *$179->function
$181 = {ginfo = {name = 0x82f89bc "main", value = {ivalue = 137333428,
block = 0x82f8ab4,
bytes = 0x82f8ab4 "@\203\004\bi\203\004\b\210\211/\bä\212/\b\002",
address = 137333428, chain = 0x82f8ab4}, language_specific = {
cplus_specific = {demangled_name = 0x0}}, language = language_c,
section = 11, bfd_section = 0x82d4fc0}, type = 0x82faaa8,
namespace = VAR_NAMESPACE, aclass = LOC_BLOCK, line = 5, aux_value = {
basereg = 0}, aliases = 0x0, ranges = 0x0, hash_next = 0x0}
(top-gdb)
And it was! Let's look at those four symbols:
(top-gdb) p *$179->sym[0]
$182 = {ginfo = {name = 0x82f89f8 "argc", value = {ivalue = 8, block = 0x8,
bytes = 0x8 <Address 0x8 out of bounds>, address = 8, chain = 0x8},
language_specific = {cplus_specific = {demangled_name = 0x0}},
language = language_c, section = 0, bfd_section = 0x0}, type = 0x82df828,
namespace = VAR_NAMESPACE, aclass = LOC_ARG, line = 4, aux_value = {
basereg = 0}, aliases = 0x0, ranges = 0x0, hash_next = 0x0}
(top-gdb) p *$179->sym[1]
$183 = {ginfo = {name = 0x82f8a34 "argv", value = {ivalue = 12, block = 0xc,
bytes = 0xc <Address 0xc out of bounds>, address = 12, chain = 0xc},
language_specific = {cplus_specific = {demangled_name = 0x0}},
language = language_c, section = 0, bfd_section = 0x0}, type = 0x82faaf4,
namespace = VAR_NAMESPACE, aclass = LOC_ARG, line = 4, aux_value = {
basereg = 0}, aliases = 0x0, ranges = 0x0, hash_next = 0x0}
(top-gdb) p *$179->sym[2]
$184 = {ginfo = {name = 0x82f8a70 "argc", value = {ivalue = 0, block = 0x0,
bytes = 0x0, address = 0, chain = 0x0}, language_specific = {
cplus_specific = {demangled_name = 0x0}}, language = language_c,
section = 0, bfd_section = 0x0}, type = 0x82df828,
namespace = VAR_NAMESPACE, aclass = LOC_REGISTER, line = 4, aux_value = {
basereg = 0}, aliases = 0x0, ranges = 0x0, hash_next = 0x0}
(top-gdb) p *$179->sym[3]
$185 = {ginfo = {name = 0x82f8aac "local", value = {ivalue = 134517720,
block = 0x80493d8, bytes = 0x80493d8 "É\f", address = 134517720,
chain = 0x80493d8}, language_specific = {cplus_specific = {
demangled_name = 0x0}}, language = language_c, section = 14,
bfd_section = 0x0}, type = 0x82df828, namespace = VAR_NAMESPACE,
aclass = LOC_STATIC, line = 6, aux_value = {basereg = 0}, aliases = 0x0,
ranges = 0x0, hash_next = 0x0}
(top-gdb)
Hey! Why are there two entries for argc? (This is the extra tangle I
was referring to. If you know all about this, you can stop reading
now.)
The two `argc' symbols have different address classes: one has an
address class that indicates it's an argument, and the other doesn't.
The argument symbol describes where the variable is passed on the
stack (eight bytes after %ebp), whereas the non-argument symbol
describes where the variable lives in the block of the function:
register zero, or %eax.
As a sanity check, let's look at the IA-32 code for main:
(top-gdb) c
Continuing.
(gdb) disass main
Dump of assembler code for function main:
0x8048340 <main>: push %ebp
0x8048341 <main+1>: mov %esp,%ebp
0x8048343 <main+3>: sub $0x8,%esp
0x8048346 <main+6>: mov 0x8(%ebp),%eax
0x8048349 <main+9>: and $0xfffffff0,%esp
0x804834c <main+12>: mov 0x80493d8,%edx
0x8048352 <main+18>: movl $0x80483c8,(%esp,1)
0x8048359 <main+25>: imul %edx,%eax
0x804835c <main+28>: mov %eax,0x4(%esp,1)
0x8048360 <main+32>: call 0x8048268 <printf>
0x8048365 <main+37>: mov %ebp,%esp
0x8048367 <main+39>: pop %ebp
0x8048368 <main+40>: ret
End of assembler dump.
(gdb)
So, yes, the compiler did copy `argc' from the stack into %eax.
Check.
But *why* does GDB do this? I have no idea. It seems to me that,
with prologue skipping et al, simply having a single LOC_REGPARM would
be the Right Thing. I don't really know when GDB will prefer the
argument entry, and when it'll prefer the non-argument entry.
I suspect it's historical. If you look at the stabs spec, you'll see
that it actually emits two stabs for arguments that are passed in one
place, but get moved somewhere else:
$ objdump --stabs func
...
329 FUN 0 5 08048340 12145 main:F(0,1)
330 PSYM 0 4 00000008 12157 argc:p(0,1)
331 PSYM 0 4 0000000c 12169 argv:p(1,1)=*(7,36)
332 SLINE 0 5 00000000 0
333 SLINE 0 7 0000000c 0
334 SLINE 0 8 00000025 0
335 RSYM 0 4 00000000 12189 argc:r(0,1)
336 STSYM 0 6 080493d8 12201 local:V(0,1)
337 LBRAC 0 0 0000000c 0
338 RBRAC 0 0 00000029 0
339 FUN 0 0 00000029 0
...
$
The PSYM accounts for the argument symbol, and the RSYM accounts for
the internal symbol. A lot of GDB's data structures very closely
match what's provided in STABS. (The partial symbol tables are a
good example of this: they correspond exactly to the EXCL links.)
But anyway, all this could be handled much better nowadays using Dwarf
2 CFA and location lists. I've been saying that for years, but it
hasn't happened yet. Andrew has the CFI done now (I think?), and
Daniel B. has submitted a patch for location expressions (but not
location lists, tho they would be easy to add), but it's awaiting
revision while he works on law school.