== libdw DWARF writing support == The current thinking is to have a DWARF writer entirely in C++ that is wholly independent of the libdw (C) reader code. Tnat is, DIEs being constructed for the writer can't be used with Dwarf_Die et al. Only the C++ interfaces for handling DIE trees and attributes will be compatible (via templates? or just source-compatible?) between the C++ front end to the existing libdw reader code and the C++ writer code. To make the libdw reader code work with writer-constructed DIEs, either the reader would need special internal hooks to distinguish Dwarf_Die's taken from writer DIEs, or the writer would need to keep DWARF file format mocked up in memory so you can use the reader on it. That means constructing abbrev tables for DIEs whose attributes are being composed on the fly, etc. Conversely, by keeping it to the C++ interface level only, we can have writer data structures that use straightforward STL types for attributes, child lists, etc., while collecting the data. The writer need only lay out abbrevs for the file format when it actually goes to produce the output file format. === abbrev generation === The main task of the output generation is creating the optimal .debug_abbrev table to keep the .debug_info as small as possible. 1. Collect all the DIE shapes used: haschildren + set of attr (name, form*) pairs (ignore DW_AT_sibling) Note form* should be "form flavor", independent of exact encoding: block, udata, sdata, string, addr, flag, ref. Use shape as a key, and note all users of each shape. (Note attrs are unordered, canonicalize order for key comparison.) 1. On each shape * For each block/*data/ref used, calculate smallest flavor that covers all users. * big enough for value * relocatable form if value must be * Assign the next abbrev code, store it in DIE objects of users. === relocation writing === DwarfRelocs will discuss smart reloc handling on the reader side, to replace the libdwfl relocate-en-masse code for .ko.debug files. On the writer side we need the same level of sophistication. At the C++ layer both sides should have compatible ways of describing relocatable forms such as target address. The writer can also generate internal relocs for ref_addr forms and fixed-size offset/address fields in headers and .debug_* sections. === format feature compatibility === The writer will have knobs for which DWARF features can be used in the output. We'll use a common argp child for parsing a list of feature names into a flag-set; also canonical aliases for "set that gdb version N groks", etc. An option to DwarfLint can complain about using features outside a given set. The writer can be set to flag them, and/or have ways to transform them. e.g. ref_addr, DwarfInterObject, imported_unit, etc. === compressor/exploder === The compressor or exploder logic will consist largely of transformations to exploit or get rid of constructs in the feature flag set. Compressing will also do pure consolidation of duplicates within a CU, which exploding won't reverse. ==== inter-CU refs ==== When imported_unit/partial_unit format feature is enabled, duplicate reduction can look across CUs for matching subtrees. 1. When two CUs contain an identical subtrees, generate a potential partial_unit containing it and replace each original copy with an imported_unit referring to it. 2. Later, coalesce each partial_unit with all other partial_unit's pointed to by the same set of CUs. 3. Finally, if any CU has been reduced to nothing but one imported_unit, coalesce the referenced partial_unit back into being that CU itself, with other CUs' imported_unit's referring to it. ==== multi-object compression ==== DwarfInterObject will discuss compressing multiple separate .debug files together so the resultant files can refer to each other's CUs. The method would be the same as for inter-CU refs, stretched across CUs in many objects.