== DWARF with inter-object references == This is a proposal to extend the ELF formats used for DWARF so that DW_FORM_ref* can be resolved to a DIE in a CU in another object file. ==== DWARF relocs & .debug_symtab ==== In ET_REL .debug files, existing .rel.debug_info are split. The relocs referring to symbols in allocated sections stay as they were. Any relocs resolved to .debug_* sections are moved to new .debug_rel.* sections. In other .debug files, compression can generate new relocs in .debug_rel.* sections. These are nonallocated sections that apply to the nonallocated sections in the .debug file. .debug_rel* sections are SHT_REL(A) and follow the rules: sh_info points to .debug_* needing reloc, sh_link to the symbol table. However, all .debug_rel* sections point to a new .debug_symtab instead of the existing symtab. .debug_symtab is SHT_SYMTAB but contains only symbols needed for .debug_rel.debug_* relocs, disjoint from real program symbols. These can be SHN_UNDEF in one object if the same-named symbol is defined in another object. (???) Maybe don't support SHN_UNDEF at all, only support archive convention hack (below)? DW_FORM_ref_addr and similar uses that in DWARF encode a relocatable offset into a .debug_* section, are subject to inter-object refs. That means .debug_rel.debug_* has a reloc at the offset in the DWARF section where this relocatable offset appears. The decoder has to check for such relocs when consuming a file with .debug_rel.debug_* reloc sections. When resolving formref, or equivalent header field, the reloc's symbol takes you to another object that defines the symbol, and the DIE/etc at that symbol's value in that other object. ==== DWARF archive convention ==== Normally inter-object refs would only be supported when all the objects referring to each other are put together into an archive (debug.a). The archive members with normal names are the ELF .debug files (as from eu-strip -f), by convention named with what the full path name below the location of debug.a would be (usr/bin/foobar.debug, usr/lib/libfoo.so.1.debug, etc) if the old-style .debug files were separately installed. If there are ELF files with .debug_symtab sections, the archive symtab would refer to those (?). ====== consolidated sections ====== Any sections that would normally be SHT_STRTAB can be SHT_NOBITS in an ELF file inside the archive. That means that there is a file in the archive with the name of the section (.strtab) that can be used instead. In this way, all the string tables in all the ELF files can be consolidated and uniquified in the common .strtab file. The .strtab (or whatever name) archive member has exactly the contents that the ELF section would have. Maybe even permit this for .shstrtab. Perhaps instead always have a fixed-named file for all strtabs, being SHT_STRTAB (.strtab, .shstrtab) and .debug_str, merging all strings. The same can be done with .debug_{abbrev,str,loc,ranges}. Could be one archive member for each. Or maybe it could be a single fixed name of the special archive member that is the total merge of all the nonallocated SHT_NOBITS sections. All offsets (loclistptr, abbrev offset, rangesptr?, etc) relative to this member contents instead of the normal section. ====== consolidated .debug_symtab ====== A similar treatment could be done for all the ELF files' .debug_symtab sections, used for the inter-object ref relocs. That is, consolidate into a single big symtab that all .debug_rel.* relocs refer to, kept in a special archive member. This member has contents like an SHT_SYMTAB section's contents, but st_value is an absolute position in the whole archive (lying in the middle of some ELF file, or some special archive member). If all inter-object relocs point to .debug_symtab that is SHT_NOBITS and shared in the consolidated .debug_symtab, then: 1. No SHN_UNDEF symbols, all resolved at debug.a packing time. 2. No need for symbol names, all st_name can be 0. This makes it attractive to mandate this and so not implement any SHN_UNDEF symbol resolution logic at all in the reader support for debug.a format. ====== consolidated partial_unit .debug_info ====== If using a consolidated .debug_symtab, interobject DW_FORM_ref_addr relocs can resolve to inside a consolidated .debug_info archive member. This can contain all the partial_unit DIEs that ELF files' imported_unit's point to. That keeps those DIEs out of any real file's top-level tree, so they don't have to be iterated over. ====== archive quick-lookup ====== Could have some more special archive members containing lookup tables for quick access. 1. Look up by build ID. Tie in to DebugInfo finding conventions. 1. consolidated .debug_pubnames? .debug_pubtypes? Seems handy for !DSOs in many-libraries package's archive. 1. source file names -> CU