Bug 32616

Summary: Support .debug_names generated by clang
Product: gdb Reporter: Simon Marchi <simon.marchi>
Component: gdbAssignee: Not yet assigned to anyone <unassigned>
Status: NEW ---    
Severity: normal CC: simark, tromey
Priority: P2    
Version: HEAD   
Target Milestone: ---   
See Also: https://sourceware.org/bugzilla/show_bug.cgi?id=24820
https://sourceware.org/bugzilla/show_bug.cgi?id=31010
Host: Target:
Build: Last reconfirmed:
Project(s) to access: ssh public key:
Bug Depends on:    
Bug Blocks: 26909    

Description Simon Marchi 2025-01-29 20:24:54 UTC
I'm currently investigating the possibility of supporting .debug_names indices as generated by clang.  Here's a dump of what I know for now.

---

A major problem in the past seemed to be the lack of DW_IDX_parent in the indices generated by clang.  See commit message here:

  https://gitlab.com/gnutools/binutils-gdb/-/commit/b371f07c47c73d9597f74f87bc6e22ba04db1963

However, starting with clang 18, they are included.

Relevant commit: https://github.com/llvm/llvm-project/commit/b6677835fed3a204fa043e079a135c4a225d2c0e
Discussion that lead to adding it: https://discourse.llvm.org/t/rfc-improve-dwarf-5-debug-names-type-lookup-parsing-speed/74151

---

By default, linkers just concatenate the .debug_names sections coming from the various compilation units (like they do for other sections).  This results, in the linked file, in a .debug_names section containing multiple individual indices.

GDB rejects this outright:

https://gitlab.com/gnutools/binutils-gdb/-/blob/7a3e81eaa4a6ab84b6b477f88f382b7e65fcfcee/gdb/dwarf2/read-debug-names.c#L455-462

lld, starting with version 19, has the --debug-names option, instructing it to merge the input .debug_names section into a single index.

However, if needed, is there could we add support for reading multiple indices from a .debug_names section?  Any technical reason preventing that?

---

GDB expects the presence of some extra, non-standard index attributes (GDB adds those attributes when it itself writes a .debug_names section).  They are documented in the manual, but I'm copying them here for clarity.

 - DW_IDX_GNU_internal: when set, indicates that the associated entry has static linkage.
 - DW_IDX_GNU_main: when set, indicates that the associated entry is the program’s main.
 - DW_IDX_GNU_language: It is a ‘DW_LANG_’ constant, indicating the language of the associated entry.
 - DW_IDX_GNU_linkage_name: It is a flag that, when set, indicates that the associated entry is a linkage name, and not a source name. 

It seems to me GDB could workaround the fact that these attributes are missing by poking at the DIEs, but not too much (for instance by reading a single attribute from the DIE).
Comment 1 Tom Tromey 2025-01-30 03:46:55 UTC
(In reply to Simon Marchi from comment #0)


> However, if needed, is there could we add support for reading multiple
> indices from a .debug_names section?  Any technical reason preventing that?

No reason at all, this shouldn't be too hard to do.
We could even have the reader use shards like the indexer.

In this setup, is it possible for some CUs to be missing an index and
need to be indexed?  Supporting this would be more involved.

>  - DW_IDX_GNU_main: when set, indicates that the associated entry is the
> program’s main.
>  - DW_IDX_GNU_language: It is a ‘DW_LANG_’ constant, indicating the language
> of the associated entry.
>  - DW_IDX_GNU_linkage_name: It is a flag that, when set, indicates that the
> associated entry is a linkage name, and not a source name. 

> It seems to me GDB could workaround the fact that these attributes are
> missing by poking at the DIEs, but not too much (for instance by reading a
> single attribute from the DIE).

It's not really possible to read a single attribute from a DIE.
Reading a single DIE in isolation is kind of possible though
in gdb it would mean some new API to looking up the abbrev,
since right now it reads the entire abbrev table into a hash table.

Finding the language could be done ok this way.  It might be possible
to remove gdb's need to know the entry's language.  I am not
really sure but I do wonder if lookup_name_info should capture
the desired language at creation time and then remove some of
these weird "loop over languages" things gdb does.

The linkage name seems a little troublesome.  If linkage names are
in the index, this would amount to examining every DIE in the index
to see where the name came from.  That sounds expensive.

The "main" name can't really be handled this way unless, again, you
want to revisit every DIE mentioned in the index.


Some of this stuff was discussed over in bug#24820.  There's some
clang discussion there as well.


I think you also have to consider what exactly are the contents of
the index.  For example:

gdb assumes that nested names will be in there, so for example a
nested function will be mentioned.

gdb treats enumerator constant scoping specially, matching the language
semantics (not the DWARF structure).  There's also a special
case for DW_TAG_entry_point.

An inline function in a partial unit should be attributed to each
outermost full unit that imports (directly or indirectly) that
PU.  gdb currently does not do this, there's a bug open about it.

Any definition coming from an imported PU is currently attributed
to some "canonical" includer.  The reason for this is that a PU
can't be read in isolation, the process has to start at a CU --
and .debug_names doesn't record inclusion information.  (The inline
case mentioned above is a special case, since there we need to
read every possible such CU looking for inlining locations.)

Anyway the point here is that if the index isn't complete then some
things will not work as expected.


One thing to note here is that gdb's new indexer was written to basically
be "the same as" the .debug_names process as described in the DWARF
standard.  However, gdb had to fix various bugs in the spec.  I know
I had a list of these at one point, but I can't find it... probably
buried in some bug or email somewhere :(
Comment 2 Tom Tromey 2025-01-30 03:52:57 UTC
Oh, gdb I think omits the parentage info from a linkage-name entry.
That's not in the spec but it didn't really make sense to me for a
linkage name to have a parent.
Comment 3 Tom Tromey 2025-01-30 14:36:12 UTC
It occurred to me afterward that the partial unit stuff probably
isn't relevant here, since pretty much only dwz generates that.
Comment 4 Simon Marchi 2025-01-30 18:18:34 UTC
> In this setup, is it possible for some CUs to be missing an index and
> need to be indexed?  Supporting this would be more involved.

Yes, I suppose, but it could happen also with a merged index, if some input .o files had an index and some didn't.

> It's not really possible to read a single attribute from a DIE.
> Reading a single DIE in isolation is kind of possible though
> in gdb it would mean some new API to looking up the abbrev,
> since right now it reads the entire abbrev table into a hash table.

Yes, sorry, that's what I meant.  You can scan/skip the attributes of the DIE until you find the one you are looking for.

> Finding the language could be done ok this way.  It might be possible
> to remove gdb's need to know the entry's language.  I am not
> really sure but I do wonder if lookup_name_info should capture
> the desired language at creation time and then remove some of
> these weird "loop over languages" things gdb does.

The language is a CU-level thing, right?  Having to poke the DIE of each CU to get the language sounds not too bad.

> The linkage name seems a little troublesome.  If linkage names are
> in the index, this would amount to examining every DIE in the index
> to see where the name came from.  That sounds expensive.

In my mind, to know if an index entry is a name or linkage name, we could fetch the DW_AT_name and DW_AT_linkage_name of the DIE, and see which of the two matches the name associated to the index entry.

> 
> The "main" name can't really be handled this way unless, again, you
> want to revisit every DIE mentioned in the index.

I suppose we would only need to visit the subprogram DIEs.

I guess we can do that, to make it work.  But then, if it adds significant processing time, we can use that experience to push for clang to implement it, and then for standardization.


> Some of this stuff was discussed over in bug#24820.  There's some
> clang discussion there as well.

Thanks, I'll read it.

> I think you also have to consider what exactly are the contents of
> the index.  For example:
> 
> gdb assumes that nested names will be in there, so for example a
> nested function will be mentioned.
> 
> gdb treats enumerator constant scoping specially, matching the language
> semantics (not the DWARF structure).  There's also a special
> case for DW_TAG_entry_point.

If we find some differences like that, we can either accept that the behavior of GDB will differ with vs without an index, or implement workarounds like those mentioned above.

I looked up DW_TAG_entry_point in DWARF5.pdf, and I must say I don't really understand, it doesn't describe what an "alternate entry point" is other than saying it exists. 

> An inline function in a partial unit should be attributed to each
> outermost full unit that imports (directly or indirectly) that
> PU.  gdb currently does not do this, there's a bug open about it.
> 
> Any definition coming from an imported PU is currently attributed
> to some "canonical" includer.  The reason for this is that a PU
> can't be read in isolation, the process has to start at a CU --
> and .debug_names doesn't record inclusion information.  (The inline
> case mentioned above is a special case, since there we need to
> read every possible such CU looking for inlining locations.)
> 
> Anyway the point here is that if the index isn't complete then some
> things will not work as expected.

I can't comment on the PU thing because I don't know that area enough.  I'll focus on the standard CU case first.

> One thing to note here is that gdb's new indexer was written to basically
> be "the same as" the .debug_names process as described in the DWARF
> standard.  However, gdb had to fix various bugs in the spec.  I know
> I had a list of these at one point, but I can't find it... probably
> buried in some bug or email somewhere :(

If you find them again, I could work on getting some fixes upstream to the DWARF committee.
Comment 5 Tom Tromey 2025-03-08 16:08:34 UTC
(In reply to Tom Tromey from comment #1)

> Finding the language could be done ok this way.  It might be possible
> to remove gdb's need to know the entry's language.  I am not
> really sure but I do wonder if lookup_name_info should capture
> the desired language at creation time and then remove some of
> these weird "loop over languages" things gdb does.

For a different series I'm working on, I needed something like this.
It turns out there's already a function to find the CU's language on
demand.  Hacking this into expand_symtabs_matching is as easy as:

	  enum language entry_lang = entry->lang;
	  if (entry_lang == language_unknown)
	    {
	      entry->per_cu->ensure_lang (per_objfile);
	      entry_lang = entry->per_cu->lang ();
	    }

So, if DW_IDX_GNU_language is missing, just using language_unknown should
be ok.

No claims about performance of this but OTOH it's at search time, not
scan time.

We do need to know the language to avoid excessive CU expansion.  This
was bug#31010.