This is the mail archive of the gdb@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Regression handling linkage vs source name

From: Doug Evans <dje at google dot com>
To: Michael Eager <eager at eagercon dot com>
Cc: gdb at sourceware dot org, Jan Kratochvil <jan dot kratochvil at redhat dot com>
Date: Wed, 22 Feb 2012 19:07:34 -0800
Subject: Re: Regression handling linkage vs source name
Authentication-results: mr.google.com; spf=pass (google.com: domain of dje@google.com designates 10.52.65.100 as permitted sender) smtp.mail=dje@google.com; dkim=pass header.i=dje@google.com
References: <4F4439B2.70101@eagercon.com> <CADPb22QP16WgMffDAwdCNBjjZ7+OUmfd5M4qDZeRAaE=YZo6-Q@mail.gmail.com> <4F454188.7000904@eagercon.com>

On Wed, Feb 22, 2012 at 11:27 AM, Michael Eager <eager@eagercon.com> wrote:
> On 02/22/2012 08:40 AM, Doug Evans wrote:
>>
>> On Tue, Feb 21, 2012 at 4:41 PM, Michael Eager<eager@eagercon.com> ?wrote:
>>>
>>> There were changes to dwarf2_physname() around July 1, 2011, which
>>> cause it to return a different value. ?Arguably, the changes make
>>> the function work better, returning the linkage name for a symbol where
>>> it previously returned the source name.
>>
>>
>> Tangential data point: Outside of dwarf2read.c, gdb generally uses
>> "physname" to mean linkage name.
>
>
> It doesn't appear to be the case for the value returned by
> dwarf2_compute_name(). ?I haven't looked at other places.

I'm not sure what your point is here, so I'm going to set it aside for now.
[My comment (tangential data point) still stands though.]

>
>> I'm not sure dwarf2_physname ever returned the linkage name (for c/c++).
>> I know there's been some changes in how it computes the source name.
>
>
> Before the patch, it didn't for C. ?I didn't look at C++.

Ah.  Indeed.
Fun fun fun.

>>> But since the source name
>>> is overwritten by the linkage name in the symbol entry, gdb's
>>> behavior is different, and, IMO, incorrect.
>>
>>
>> Can you elaborate on what you mean by overwritten?
>
>
> In new_symbol_full():
> ? ? ?/* Cache this symbol's name and the name's demangled form (if any). ?*/
> ? ? ?SYMBOL_SET_LANGUAGE (sym, cu->language);
> ? ? ?linkagename = dwarf2_physname (name, die, cu);
> ? ? ?SYMBOL_SET_NAMES (sym, linkagename, strlen (linkagename), 0, objfile);
>
> Clearly the comment is incorrect or misleading: ?only one name is stored in
> the symbol.

Well, yes and no.
Only one name is passed to SYMBOL_SET_NAMES, but it does at least try
to set both the mangled(/linkage) and demangled(/source-with-caveats)
names.

>
> Before the patch, dwarf2_physname() would call dwarf2_compute_name() which
> would return 'name' unmodified.
>
> After the patch, dwarf2_physname() returns the linkage name or a mangled
> name.

Right.  This was done to address potential shortcomings in c++-land.
IIUC, We totally missed gcc's asm(name) feature.  Blech.

> (I initially misread the code to save the source name in the symbol,
> then overwrite it with the linkage name.)
>
>
>> [It's not what I see.]
>>
>>> Here is the test case:
>>>
>>> $ cat t.c
>>> extern int foo(void) asm ("xfoo");
>>>
>>> int foo(void) {}
>>>
>>> int main (void)
>>> {
>>> ?foo ();
>>> }
>>>
>>> $ gdb-before a.out
>>> ...
>>> (gdb) b foo
>>> Breakpoint 1 at ...
>>>
>>> $ gdb-after a.out
>>> ...
>>> (gdb) b foo
>>> Make breakpoint pending on future shared library load? (y or [n])? n
>>> (gdb) b xfoo
>>> Breakpoint 1 at ...
>>>
>>>
>>> The symbol "foo" is no longer present in gdb symbol table
>>> so trying to use the name specified in the source fails.
>>> Before the change, breakpoint, backtrace, and other commands
>>> display the symbol name as in the source (and in the DWARF DIE).
>>> After the change, the linkage name is displayed instead of the
>>> source name.
>>
>>
>> Even pre-dwarf2_physname gdb's have this problem.
>> Maybe you tried a 7.x variant that I didn't (I think I tried 6.8, 7.0,
>> 7.1, and 7.4 - at the time I didn't have easy access to 7.2,7.3 and
>> didn't want to go to trouble of building them).
>
>
> This works correctly in 7.2. ?Maybe this was an "accident". ?:-)

I'm not sure, though I think it was at least partly accidental.

>
>> The problem (or at least one of the problems), as I see it, is that
>> the API for creating symbols is broken.
>
>
> I agree.
>
>
>> gdb does (or at least can) record both the linkage and source names
>> (it does this for "minsyms", e.g. ELF symbols). ?But the way to set a
>> symbol's name is via symbol_set_names and it only takes a linkage name
>> (though the dwarf2read.c further compounds the problem by passing it
>> the source name - or more accurately the "search" name).
>> symbol_set_names then tries to demangle the name and will record that
>> name as well if the demangling succeeds.
>
>
> Well, the minisym only stores a single name, not both linkage and source
> names. ?Despite what the comments preceding symbol_set_names() says,
> there's only one name saved.

Sorry, this is another c-vs-c++ mixup.
Minsyms don't come from the debug info so there's no way to obtain the
source name for C.  However, for c++ the linkage name can be demangled
and that name is indeed stored for minsyms (in addition to the linkage
name). [Setting aside the case where the program uses asm(name).]

> The code in symbol_set_names() seems to be doing way more than it needs
> to. ?If both the source and linkage names are available, then there
> is no need to mangle, demangle, add a prefix or do anything but save
> the name.

I think part of the intent here is to have the demangled name
available even if there is no debug info (e.g. for c++).  [As for how
one goes about achieving that, I wouldn't disagree with the claim that
there is room for improvement.]

>
> (symbol_set_demangled_name() does save a demangled name for C++.)
>
>
>>> Before the change, dwarf2_physname() calls dwarf2_compute_name()
>>> which returns the symbol name if the language is not C++, Java,
>>> or Fortran, even if the DIE has a DW_AT_linkage_name attribute.
>>> After the change, dwarf2_physname() returns the DW_AT_linkage_name.
>>>
>>> Since gdb doesn't keep both the source name and the linkage
>>> name in the symbol table (a design flaw, IMO) what is the
>>> right way to get the previous behavior, where gdb recognizes
>>> the symbol names from the source file?
>>
>>
>> We need to have dwarf2read.c store both names (linkage name and dwarf
>> name). [More changes may be required beyond that, but I think that's a
>> start.] ?Your test program shows that we can't assume we can simply
>> demangle the linkage name to get the source name.
>
>
> Yes, this was my conclusion as well.
>
> When debug info is not available, mangling a source name or demangling
> a linkage name may be necessary. ?I don't know that this should be done
> when reading the DWARF info. ?When the debug info contains both names,
> or the linkage name can be inferred from the DIE, doing anything
> other than saving their values should not be necessary.

In general I agree.  I think part of the concern here is if there is a
problem in what gcc (or whatever) provides, and how to best cope.

> Understanding what the symbol routines are trying to do with all of
> the mangling and demangling is a challenge.

You're preaching to the choir here ... :-)

> One thing which is not
> clear is whether there is agreement whether the name stored for a symbol
> is supposed to be the source name or the linkagename.

This is part of what I was referring to in my tangential data point.
dwarf2read.c stores the demangled name in symbol.ginfo.name which is
defined to contain the mangled name (if it wasn't, why does
symbol.ginfo.language_specific.cplus_specific.demangled_name exist
:-)).

> symbol_set_names()
> and symbol_set_demangled_name() suggest that symtab.c thinks it is the
> linkage name. ?Commands which look up a symbol assume that it is the
> source name.

re: commands: minsym lookup will make two passes first trying the
passed in name as the linkage name, and then trying it as the
source(/search) name, whereas fullsym lookup will just look at one
(SYMBOL_SEARCH_NAME).

> Should there be two symbol entries, one for "foo" and one for "xfoo"?
> A single entry which is searched for a match on either name or linkage
> name? ?Something else?

I think we do need to store both.  Otherwise lookup is going to be too painful.
Beyond that, the devil is in the details.

Note for reference sake: What kind of demangled name to store is a bit
more complex than simply storing the fully demangled name.  gdb stores
the "search" name which is not necessarily equivalent to the full
source name (e.g. for c++ templates that encode the result type in the
linkage name).

References:
- Regression handling linkage vs source name
  - From: Michael Eager
- Re: Regression handling linkage vs source name
  - From: Doug Evans
- Re: Regression handling linkage vs source name
  - From: Michael Eager

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]