This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: comparing die trees/graphs


That's the basics, and that's indeed how dwarf_comparator works (depth first).

> - The attribute values can also be references to DIEs. This might be new
>   DIEs, but also DIEs "up" in the DIE children tree. This is why the DIE
>   tree is actually a (pointed) graph.

This is indeed the only reason the whole thing is hairy at all.

> - For two DIEs to be equal the attribute values of the attributes with
>   the same key names that have references to DIEs as value should:
>     a) Be both not yet seen DIEs in the respective trees so far, and
>        be equal DIEs (in this case we can just treat the attribute
>        reference as we would a child DIE, compare it in order).
>   Or:
>     b) Be both already seen DIEs in the respective trees so far, and be
>        in the "same location".

I don't understand the "already seen" or not distinction here.  Whether
they are forward references or not doesn't matter in any way I can think
of (except in how you'd implement it, of course).  Regardless, you need
the subtrees to be equal and their locations to be equivalent.

> The "same location" means that the references to the DIEs in the trees
> are at the same place if we would walk the tree in the order chosen
> above.

Approximately, yes.  I've called this "equivalent context" rather than
"same location" to abstract it a little.  

The definition for "exact equivalence" I mean is that each parent DIE
has matching attributes and equivalent parents.  That's a recursive
definition, so it iterates on up to the root of the tree (the CU).

For the overall goal of maximal sharing constrained only by the real
high-level semantic needs to avoid wrong conflations, we'll want some
flexibility to ignore some kinds of attribute differences.

This decision is encoded in dwarf_comparator::equal_enough.  The current
definition of "equal enough" is that we ignore the reference attributes.
That's as much as anything else just because another vector of complex
equality checking with references would just make things even harder to
think about.  In practice, I'm pretty sure the differences between
"context" DIEs that matter semantically are not in their references.

But, in the abstract, if it were easy to implement, then the ideal place
to start would be "exact equivalence", references and all, and whittle
down from there.

> To keep the order of the nodes I would keep two stacks of DIEs that
> represent the path traveled so far. That way whether or not a DIE has
> already been seen is just whether it is already on the stack, and the
> "location" can just be expressed as the index of that DIE into the
> stack.

The comparator does a depth-first recursive traversal.  So it has a live
call frame associated with each DIE currently under consideration.
Remembering anything beyond that is done by the tracker.

The comparison constructs a tracker::step object when it considers a
pair of DIEs.  This object and its destructor are what trigger the
tracker code to record where we are right now.  It tracks each of the
two parallel walks with a dwarf_path_finder.  Each of these holds a a
stack of DIEs, called a die_path in the code.

A step on the walk (tracker::step -> dwarf_path_finder::step) pushes the
new DIE on the stack.  It also records the whole current stack in a map
keyed by that DIE's identity.  Looking these up are how we can consider
the "context" when we have an attribute reference to this DIE later on.

> Does any of the above make sense? Is this what the comparator/tracker
> logic does? Are there wrinkles in the above logic that make it more
> involved? What kind of caching should be added to make the above
> easier/better/faster?

You've asked a lot and we'll have to get to it a bit at a time.

I'm not sure I followed exactly what you meant about a stack or how it
relates to the tracking we have going on now.

The definition of "equal" is recursive, in that for two DIEs to be equal
their references have to be to equal DIEs.  Yet the referenced DIEs may
have references of their own that have to be compared.  After one step
or many steps, references could lead back to the first DIE.  Not only do
you have to terminate, but two DIEs with congruently circular references
must be considered equal.

If you follow forward references immediately to answer the equality
question, you will cover lots of the tree out of order relative to the
main comparison walk.  If you did no caching, you would repeat many
subtree comparisons many many times.

So the tracker also records equivalences.  If you compare two subtrees
once in the main walk, you want to record that they were or weren't
equal, so that you know that answer immediately when you later come
across a need to compare reference attributes pointing to each.
Likewise, if you follow a forward reference and compare referenced
subtrees, you want to record that result so that when you come across
that subtree later in the main walk (or again via another reference)
then you don't repeat that comparison.

tracker::reference_match is an untidy mashup of both the attempts at
handling circularities and the caching of comparison results.  It isn't
doing either correctly.


Thanks,
Roland

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]