This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: comparing die trees/graphs

From: Mark Wielaard <mjw at redhat dot com>
To: elfutils-devel at lists dot fedorahosted dot org
Date: Fri, 16 Jul 2010 12:16:11 +0200
Subject: Re: comparing die trees/graphs

On Fri, 2010-07-16 at 02:17 -0700, Roland McGrath wrote:
> > - For two DIEs to be equal the attribute values of the attributes with
> >   the same key names that have references to DIEs as value should:
> >     a) Be both not yet seen DIEs in the respective trees so far, and
> >        be equal DIEs (in this case we can just treat the attribute
> >        reference as we would a child DIE, compare it in order).
> >   Or:
> >     b) Be both already seen DIEs in the respective trees so far, and be
> >        in the "same location".
> 
> I don't understand the "already seen" or not distinction here.  Whether
> they are forward references or not doesn't matter in any way I can think
> of (except in how you'd implement it, of course).  Regardless, you need
> the subtrees to be equal and their locations to be equivalent.

I make that distinction because I really would like to see the
equivalence as comparing two lists (the depth-first search paths) of
nodes (DIEs). This might confuse matters a little since we go from
concept to comparison algorithm immediately.

To me we are comparing paths, and "already seen" nodes/dies are just
things that have to be the same on both paths.

> > The "same location" means that the references to the DIEs in the trees
> > are at the same place if we would walk the tree in the order chosen
> > above.
> 
> Approximately, yes.  I've called this "equivalent context" rather than
> "same location" to abstract it a little.  
> 
> The definition for "exact equivalence" I mean is that each parent DIE
> has matching attributes and equivalent parents.  That's a recursive
> definition, so it iterates on up to the root of the tree (the CU).

Why the CU? Isn't the start the starting DIE(s) of the comparison? The
CU seems to be just another DIE that might be in the (flattened) path
(both CU DIEs should of course be equivalent if the appear in a
depth-first walk of the tree).

> For the overall goal of maximal sharing constrained only by the real
> high-level semantic needs to avoid wrong conflations, we'll want some
> flexibility to ignore some kinds of attribute differences.
>
> This decision is encoded in dwarf_comparator::equal_enough.  The current
> definition of "equal enough" is that we ignore the reference attributes.
> That's as much as anything else just because another vector of complex
> equality checking with references would just make things even harder to
> think about.  In practice, I'm pretty sure the differences between
> "context" DIEs that matter semantically are not in their references.

I am not sure I am following that. Isn't what they reference precisely
what makes them similar or not?

> > To keep the order of the nodes I would keep two stacks of DIEs that
> > represent the path traveled so far. That way whether or not a DIE has
> > already been seen is just whether it is already on the stack, and the
> > "location" can just be expressed as the index of that DIE into the
> > stack.
> 
> The comparator does a depth-first recursive traversal.  So it has a live
> call frame associated with each DIE currently under consideration.
> Remembering anything beyond that is done by the tracker.

And that is indeed the part I find somewhat hard to understand. In my
(simplistic) view all you need is the depth-first recursive traversal
(keep track of circularity, mark them, stop recursing, and check it is
the same circular reference in both walks). And if the full walk in both
trees finds the exact same path with the exact same nodes in both you
are done, they are similar.

> I'm not sure I followed exactly what you meant about a stack or how it
> relates to the tracking we have going on now.

The stack is just my way of expressing we keep the path of DIEs traveled
so far. It is just keeping track of when we saw which DIE. And if any
deeper reference points to a DIE we already seen it will be on that
stack (path) and for the other DIE tree to be equivalent it needs to
have a DIE reference to exactly the same point in the stack (path) we
are constructing while walking the other DIE tree.

> If you follow forward references immediately to answer the equality
> question, you will cover lots of the tree out of order relative to the
> main comparison walk.  If you did no caching, you would repeat many
> subtree comparisons many many times.

I don't doubt that without caching things might be somewhat inefficient.
But the caching confuses me somewhat, so for now I am trying to ignore
it. Wouldn't we see every subtree only once, since we are just walking
the tree from starting DIE in a specific order (breaking cycles when we
notice we already seen a DIE node earlier in the walk)?

Cheers,

Mark

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]