== DWARF attribute values == The DWARF specification (2.2) describes attribute values and their encodings in terms of '''forms''' and '''classes'''. We refine the taxonomy with the notion of '''value spaces'''. ==== form ==== The `DW_FORM_*` constants are the form encodings that appear in the actual DWARF data. There are a few forms, and some have multiple particular encodings. For example, the `udata` form comes in `data[1248]`, `udata', and `sdata` encodings, the `string` form comes in `string` and `strp` encodings, etc. The libdw reading interfaces `dwarf_form*` gloss over these encoding differences, and users never care about them. The `block` and `data` forms are ambiguous. They indicate the encoding so you can read the value from the file, but not what kind of value it really is. To disambiguate you have to know the '''class''' you are looking for. ==== class ==== DWARF (7.5.4) describes ''classes'' of attribute value. These do not exist in the format, but only in the consumer's perspective. Some classes unify several forms. Some forms multiplex several classes. The consumer distinguishes what kind of value an attribute has by looking at the value's form as written, and the set of expected classes for the particular attribute (based on the known `DW_AT_*` values). ==== value space ==== The '''class''' delineation is mostly adequate to disambiguate attribute forms. But it's not entirely so, while conversely in some cases it is overly specific for the consumer's abstract view. We refine DWARF's '''class''' with a slightly different concept that we'll call '''value space'''. The value spaces are the categories of attribute value that a consumer really wants to think about. reference:: class reference, pointer to another DIE. Pointers outside the containing CU can be relocatable (`DW_FORM_ref_addr`). CU-reference:: class reference, but CU-relative offset forms are invalid. This has to point to a different CU's top-level DIE. address:: class address, `DW_FORM_addr`. Relocatable. flag:: class flag, `DW_FORM_flag`: simple Boolean. rangelistptr:: class rangelistptr, `data` forms: offset in `.debug_ranges` lineptr:: class lineptr, `data` forms: offset in `.debug_line` macptr:: class macptr, `data` forms: offset in `.debug_macinfo`` location:: class loclistptr or block identifier:: class string, an identifier in the CU's language filename:: class string, name of a source file or directory fileidx:: class constant, an index into the CU's file table (e.g. DW_AT_decl_file) lineidx:: class constant, a line number (e.g. DW_AT_decl_line) string:: class string, string not an identifier or filename enum-constant:: class constant, with a known set of values `DW_FOO_*`[[br]] This is actually numerous value spaces all treated similarly. constant:: class constant or block or string, a target value[[br]] If certain `data` forms, might be relocatable; if `block` form, might contain relocatable portions. To interpret an attribute's value, you must know what value space that attribute is in. This comes from fixed knowledge of the known attribute names (`DW_AT_*`). For the most part, just the attribute name tells you the value space. However, e.g. `DW_AT_name` is overloaded as filename and identifier. So for the full general case, you need to know the tag name and the attribute name (`DW_TAG_*`, `DW_AT_*`). This pair maps into a set of value spaces that are expected for that attribute. If the attribute has a form that can't be one of those value spaces, then the consumer barfs. When a transformation (such as compression) comes across an attribute whose name is unrecognized and whose form is ambiguous (`string`, `data`), then it cannot necessarily complete a safe transformation. For example, any `data` form might be a `loclistptr`, so you can't rewrite the `.debug_loc` section in case the unknown attribute encoded an offset into the section; any `string` form might be a file name, so you can't rewrite file names; etc. Some combinations of value spaces create new ambiguities. For example, if something is either a location or a constant, then a `data` form is either an integer constant or a loclistptr. If there are in fact any such combinations in the known set, there has to be some priority chosen to disambiguate. ==== relocatable values ==== The address and constant value spaces can have values determined by relocations to the allocated sections. A consumer either wants implicitly relocated values (libdwfl) or explicit relocation information (compression and other transformations). e.g. `GElf_Rela` + `GElf_Sym` + name There can also be relocations to the `.debug_*` sections, in `DW_FORM_ref_addr`, `DW_FORM_strp`, and all the *ptr classes. These are not interesting to a consumer or producer application, and can be handled (and generated) entirely under the covers in libdw. In "final DWARF" (i.e. final links plus .ko), all of these can be applied in place and the relocs dropped. ==== C++ interfaces ==== Using `attr_value` objects will be based on the value space. This means it will depend on the attribute and tag. We want simple methods to extract in the expected value space and throw if the form is a mismatch. We also want methods to ask which value space it is, and some polymorphic methods like generating printable strings. The reference values are their own can of worms unlike the others. TBD.