== XML representation of DWARF == The XML/DOM view of data is a pretty good fit for DWARF trees. We use it informally all the time to describe fragments of DWARF in discussion. On the dwarf branch, tests/dwarf-print produces an almost-XML format of DWARF data. It would be valuable in several ways to define a rigorous and proper mapping of DWARF to XML. === implementation === If the world were built of ideal modular components, it would be very easy to write adapters that map between DOM interfaces and the C++ dwarf interfaces in both directions. I still think this is the right way to approach it. But when I looked at some XML implementation libraries, I didn't find any where it really seemed they were done in the right modular way to make this easy or clean. === output uses === The obvious use for XML output of DWARF is just to look at it, like we do with dwarf-print. Having it be true XML (or rather, a true DOM, which can be printed as XML) has several advantages. * You can use fancy XML-based viewers on it. * You can apply XML-based technologies like ?XPath to it. This could be an interesting way to do lots of prototyping work, using stock XML tools to do queries, subsets, and transformations on real DWARF data and see what you get out of it. It's even possible that using something like ?XPath expressions on a DOM implementation that is backed by DWARF rather than XML could be a worthwhile way to implement something for real. === input uses === One thing that's long been needed is a way to hand-write (or hand-modify) DWARF data. This would be great for things like test cases for elfutils and gdb. You could maintain XML source files and then use elfutils tools to produce DWARF output files from that. === schema === What seems worthwhile to express in XML is the "semantic view" of DWARF trees. The encoding details like abbrevs and forms are not very interesting. What maps extremely well is the basic DIE tree structure, which is in essence just like the XML DOM: a tree where each node has an element type (tag), an unordered dictionary of key/value attribute pairs, and an arbitrary number of ordered children nodes. In DWARF, some attribute values are complex or indirect things such as constant blocks, location expressions, and line information tables. These can't be represented simply in XML as attribute values. Instead, there needs to be an additional family of XML elements outside the ones directly representing the DIE tree, and attributes that point to those. ==== attribute values ==== In DWARF, each attribute value is encoded in a form, and the combination of the form and the tag/attribute where it appears indicates a ''value space''. The values of XML attributes are not distinguished this way. So a thorough XML representation would need to use some text encoding to indicate the DWARF value space. Here are some examples: * unadorned integers mean a '''constant''' * ''addr:0x123'' means a literal '''address''' * ''addr:foo+5'' means a symbolic '''address''' formed with a symbol reference and an addend * ''addr:foo'' means a symbolic '''address''' with an addend of zero * ''addr:(.foo)0x234'' means a section-relative '''address''' * ''#123'' means a '''reference'''. The XML representation of the referent DIE would use a fake attribute like ''id=#123'' to match. * ''#loc_123'' means a location expression. There would be an additional tree of XML elements outside the '''''' tree that defines referent location expressions. * ''#loclist_123'' means a location list. Another outside tree defines these. * ''#rangelist_123'' for a range list, same story. * ''#const_123'' for a constant block, similar again.