This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: dwarf_edit_output testcase sanity checking

From: Mark Wielaard <mjw at redhat dot com>
To: elfutils-devel at lists dot fedorahosted dot org
Date: Tue, 09 Nov 2010 17:18:21 +0100
Subject: Re: dwarf_edit_output testcase sanity checking

On Tue, 2010-11-09 at 00:41 -0800, Roland McGrath wrote:
> > So the problem comes from including the hash of all reference attributes
> > in the construction of the "unique identifier object" for a attribute
> > reference that points to a die. That means that references inside
> > circular structures cannot really be completed since during the
> > construction of the "unique identifier object" of reference to that
> > circular structure itself it can never be fully resolved. 
> 
> Here you mean the references that are themselves steps along the cyclic
> reference chain, right?  That is, not each and every reference that
> appears inside a DIE tree that is part of a cycle.  For example, a
> reference whose referent is some <base_type .../> DIE can always be
> completed, because that referent has no other references.  Likewise, a
> <const_type type=.../> can be completed if its type=... referent is a
> base_type.  Likewise, a <pointer_type type=.../> whose referent is such
> a const_type, etc.  i.e., any reference that is not itself part of a
> cyclic reference chain.  I think that's what you meant, but I want
> always to be sure we are being precise in understanding each other.

Yes that is what I meant, but you don't really know which reference
attributes those are till you resolve them (fully).

> > So it seems that we should declare two "resolved" states for DIEs. One
> > "tree resolved" that takes into account just the basic attributes plus
> > all the children "tree resolved". And another "fully resolved" that
> > all attributes, including reference values into account, plus "fully
> > resolved" children.
> 
> That sounds familiar.  In some previous iterations of the 'struct entry'
> thing with pending counts, I had multiple different kinds of pending
> counts.  It all got very confusing and I went back and forth confounding
> myself several times before settling on the apparently simpler approach
> we have now.  That doesn't mean this isn't the right way to go.  But it
> will take great care, and much better reasoning and commenting than I
> did before, to make sure it all makes sense.  I'm sure what I did before
> is not exactly what you are proposing, but it certainly was a way to
> make all the code even harder to follow.

It seems to become very messy if we want to do anything clever so we
don't just scan over the DIE tree twice. At least I get quickly lost
unless I just think of it as a two-pass algorithm (scan over the whole
tree once to calculate hashes just using the children, then scan over
the whole tree again to calculate the hashes taking the reference
attributes into account using the die->hash map of the first pass).

> > A reference value then only needs its referent DIE to be "tree
> > resolved" to create the "unique identifier object" for it.  That
> > "breaks the circle" at the start of a reference attribute value so you
> > don't need to track circularity anymore.
> 
> Ok, I think that makes sense.  But then we come back to the case of
> <pointer_type type=.../>.  That has no children and no non-reference
> attributes, so there is nothing but DW_TAG_pointer_type to contribute to
> its hash value, and so all of them hash the same.

Yes, but that isn't a problem I believe. Those DIEs are hash equal in
the "first pass", but we don't use the result of that for any actual
comparison. They get their hash "augmented" by the second pass which
takes the reference attributes into account (using the hash of the first
pass for the DIE the reference attribute points to).

> > The disadvantage is that you end up with two "pending queues" which
> > seems to effectively mean you will scan the whole DIE tree twice (or at
> > least calculate the hash over a DIE twice, without and then with taking
> > reference values into account). 
> 
> This might not be as repetitive as it sounds.  You don't need to visit
> the non-reference attributes a second time.  In "tree resolved" state,
> you can take the complete hash of all the non-reference attributes, and
> also hash just the names of the reference attributes, and combine those
> together as a single hash value.  Then the second pass in "fully
> resolved" state will just take the final hashes of the referent DIEs
> (pointer-hashes of their .identity () values) and combine those with the
> hash saved in the first pass--this yields the hash of an "attrs_pair".
> You do then need to visit each child DIE again to combine their final
> hashes into the "brood hash", but since each child is fully resolved by
> then, that is just a (reduce 'identity children) iteration on the vector
> (std::accumulate(c.begin(),c.end(),0,identity_hash) or something).

Yeah, if you look at it as a fully two pass algorithm it might actually
work out pretty simple after all. It just takes "a bit" of extra storage
for all DIEs.

> > And since some "tree resolved" DIEs might have the same hash, but are
> > not equal if you take the value references into account, you don't
> > have perfect hashes for the attribute reference values. But they only
> > collide when referring to the same structural trees with all basic
> > attribute in all included DIEs matching completely.
> 
> That sounds nice, but the "structural tree" <singleton_tag/> with "all
> basic attributes" being the empty set is a common case.

Maybe I am missing the actual example you have in mind. Could you post
an example tree that shows it?

Thanks,

Mrk

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]