| Summary: | Support .debug_names generated by clang | ||
|---|---|---|---|
| Product: | gdb | Reporter: | Simon Marchi <simon.marchi> |
| Component: | gdb | Assignee: | Not yet assigned to anyone <unassigned> |
| Status: | NEW --- | ||
| Severity: | normal | CC: | simark, tromey |
| Priority: | P2 | ||
| Version: | HEAD | ||
| Target Milestone: | --- | ||
| See Also: |
https://sourceware.org/bugzilla/show_bug.cgi?id=24820 https://sourceware.org/bugzilla/show_bug.cgi?id=31010 |
||
| Host: | Target: | ||
| Build: | Last reconfirmed: | ||
| Project(s) to access: | ssh public key: | ||
| Bug Depends on: | |||
| Bug Blocks: | 26909 | ||
|
Description
Simon Marchi
2025-01-29 20:24:54 UTC
(In reply to Simon Marchi from comment #0) > However, if needed, is there could we add support for reading multiple > indices from a .debug_names section? Any technical reason preventing that? No reason at all, this shouldn't be too hard to do. We could even have the reader use shards like the indexer. In this setup, is it possible for some CUs to be missing an index and need to be indexed? Supporting this would be more involved. > - DW_IDX_GNU_main: when set, indicates that the associated entry is the > program’s main. > - DW_IDX_GNU_language: It is a ‘DW_LANG_’ constant, indicating the language > of the associated entry. > - DW_IDX_GNU_linkage_name: It is a flag that, when set, indicates that the > associated entry is a linkage name, and not a source name. > It seems to me GDB could workaround the fact that these attributes are > missing by poking at the DIEs, but not too much (for instance by reading a > single attribute from the DIE). It's not really possible to read a single attribute from a DIE. Reading a single DIE in isolation is kind of possible though in gdb it would mean some new API to looking up the abbrev, since right now it reads the entire abbrev table into a hash table. Finding the language could be done ok this way. It might be possible to remove gdb's need to know the entry's language. I am not really sure but I do wonder if lookup_name_info should capture the desired language at creation time and then remove some of these weird "loop over languages" things gdb does. The linkage name seems a little troublesome. If linkage names are in the index, this would amount to examining every DIE in the index to see where the name came from. That sounds expensive. The "main" name can't really be handled this way unless, again, you want to revisit every DIE mentioned in the index. Some of this stuff was discussed over in bug#24820. There's some clang discussion there as well. I think you also have to consider what exactly are the contents of the index. For example: gdb assumes that nested names will be in there, so for example a nested function will be mentioned. gdb treats enumerator constant scoping specially, matching the language semantics (not the DWARF structure). There's also a special case for DW_TAG_entry_point. An inline function in a partial unit should be attributed to each outermost full unit that imports (directly or indirectly) that PU. gdb currently does not do this, there's a bug open about it. Any definition coming from an imported PU is currently attributed to some "canonical" includer. The reason for this is that a PU can't be read in isolation, the process has to start at a CU -- and .debug_names doesn't record inclusion information. (The inline case mentioned above is a special case, since there we need to read every possible such CU looking for inlining locations.) Anyway the point here is that if the index isn't complete then some things will not work as expected. One thing to note here is that gdb's new indexer was written to basically be "the same as" the .debug_names process as described in the DWARF standard. However, gdb had to fix various bugs in the spec. I know I had a list of these at one point, but I can't find it... probably buried in some bug or email somewhere :( Oh, gdb I think omits the parentage info from a linkage-name entry. That's not in the spec but it didn't really make sense to me for a linkage name to have a parent. It occurred to me afterward that the partial unit stuff probably isn't relevant here, since pretty much only dwz generates that. > In this setup, is it possible for some CUs to be missing an index and > need to be indexed? Supporting this would be more involved. Yes, I suppose, but it could happen also with a merged index, if some input .o files had an index and some didn't. > It's not really possible to read a single attribute from a DIE. > Reading a single DIE in isolation is kind of possible though > in gdb it would mean some new API to looking up the abbrev, > since right now it reads the entire abbrev table into a hash table. Yes, sorry, that's what I meant. You can scan/skip the attributes of the DIE until you find the one you are looking for. > Finding the language could be done ok this way. It might be possible > to remove gdb's need to know the entry's language. I am not > really sure but I do wonder if lookup_name_info should capture > the desired language at creation time and then remove some of > these weird "loop over languages" things gdb does. The language is a CU-level thing, right? Having to poke the DIE of each CU to get the language sounds not too bad. > The linkage name seems a little troublesome. If linkage names are > in the index, this would amount to examining every DIE in the index > to see where the name came from. That sounds expensive. In my mind, to know if an index entry is a name or linkage name, we could fetch the DW_AT_name and DW_AT_linkage_name of the DIE, and see which of the two matches the name associated to the index entry. > > The "main" name can't really be handled this way unless, again, you > want to revisit every DIE mentioned in the index. I suppose we would only need to visit the subprogram DIEs. I guess we can do that, to make it work. But then, if it adds significant processing time, we can use that experience to push for clang to implement it, and then for standardization. > Some of this stuff was discussed over in bug#24820. There's some > clang discussion there as well. Thanks, I'll read it. > I think you also have to consider what exactly are the contents of > the index. For example: > > gdb assumes that nested names will be in there, so for example a > nested function will be mentioned. > > gdb treats enumerator constant scoping specially, matching the language > semantics (not the DWARF structure). There's also a special > case for DW_TAG_entry_point. If we find some differences like that, we can either accept that the behavior of GDB will differ with vs without an index, or implement workarounds like those mentioned above. I looked up DW_TAG_entry_point in DWARF5.pdf, and I must say I don't really understand, it doesn't describe what an "alternate entry point" is other than saying it exists. > An inline function in a partial unit should be attributed to each > outermost full unit that imports (directly or indirectly) that > PU. gdb currently does not do this, there's a bug open about it. > > Any definition coming from an imported PU is currently attributed > to some "canonical" includer. The reason for this is that a PU > can't be read in isolation, the process has to start at a CU -- > and .debug_names doesn't record inclusion information. (The inline > case mentioned above is a special case, since there we need to > read every possible such CU looking for inlining locations.) > > Anyway the point here is that if the index isn't complete then some > things will not work as expected. I can't comment on the PU thing because I don't know that area enough. I'll focus on the standard CU case first. > One thing to note here is that gdb's new indexer was written to basically > be "the same as" the .debug_names process as described in the DWARF > standard. However, gdb had to fix various bugs in the spec. I know > I had a list of these at one point, but I can't find it... probably > buried in some bug or email somewhere :( If you find them again, I could work on getting some fixes upstream to the DWARF committee. (In reply to Tom Tromey from comment #1) > Finding the language could be done ok this way. It might be possible > to remove gdb's need to know the entry's language. I am not > really sure but I do wonder if lookup_name_info should capture > the desired language at creation time and then remove some of > these weird "loop over languages" things gdb does. For a different series I'm working on, I needed something like this. It turns out there's already a function to find the CU's language on demand. Hacking this into expand_symtabs_matching is as easy as: enum language entry_lang = entry->lang; if (entry_lang == language_unknown) { entry->per_cu->ensure_lang (per_objfile); entry_lang = entry->per_cu->lang (); } So, if DW_IDX_GNU_language is missing, just using language_unknown should be ok. No claims about performance of this but OTOH it's at search time, not scan time. We do need to know the language to avoid excessive CU expansion. This was bug#31010. |