This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: mjw/dwarf_output branch

From: Roland McGrath <roland at redhat dot com>
To: elfutils-devel at lists dot fedorahosted dot org
Date: Thu, 27 Jan 2011 18:38:25 -0800
Subject: Re: mjw/dwarf_output branch
> So what the code does at the moment is add a local_hash function to both
> dwarf_output::debug_info_die and copier::pending_entry. These hash
> functions should be in sync because a not fully resolved attribute
> reference could be pointing to a pending_entry. Both calculate the local
> hash based on the local hash of the attributes_type and the
> children_type. Then the hash of a value_reference is the local hash of
> the debug_info_die hash it is pointing at.

Ultimately dwarf_output::debug_info_entry should probably not recompute the
local hash at all.  That object only comes into existence when a
pending_entry is ready to be finalized in the collector.  So when a new one
is being created in the collector, it can just store the local hash value
already computed in the pending_entry.

> For now the really tricky case inside the local hash function of the
> attribute_types containing a reference attribute is side stepped by just
> using the tag of the reference attribute. This is enough to get the
> small testcases in dwarf_edit_output working that failed before. Because
> this makes the hashes of similar structures always the same (before they
> would have hash slightly differently based on where a attribute
> reference cycle was detected).

So I see you decided to go with computing a hash value for attributes_type
and a hash value for children_type, so the debug_info_entry hash value is
just the hash_combine of those with the tag.  That is fine, but the
write-up of the hashing algorithm should be precise about that.  It's not
exactly the same (arithmetically) as combining the tag, the hashes of each
attribute, and the hashes of each child, which is how you described it.  As
I said before, I don't think it matters a lot to the quality of the hash
which way it's done.  But we're confused enough already that having every
detail of the canonical write-up match what the code actually does is a
good idea.  (Perhaps that write-up should live on the wiki rather than just
in the list archive.)

> Ideally this would just be ref_attr.reference()->local_hash (), but
> since we calculate the hash of the attributes_type just before it is
> inserted into the collector this doesn't work (the way this aborts is
> somewhat obscure though - it ends up in in the dwarf_data::variant
> constructor that throws a "can't happen!"). I am a little confused atm
> about how the instantiation of the attributes_type gets triggered
> precisely. It is probably necessary to do this in two phases by lifting
> parts of pending_entry::final () method out in a second initialization
> step.

As I did things originally, the dwarf_output::attr_value::reference method
(really dwarf_data::attr_value::reference) is not meant to be used on
something that isn't really final and placed.  The way you got to
std::logic_error ("can't happen!") was because you used this method on an
attr_value whose _m_value pointer was a copier::entry rather than a
value_reference.  If you look at pending_dwarf::attr_value::reference,
you'll see that it checks for this case specially:

	  const entry *ref = dynamic_cast<const entry *> (_m_value);

This returns NULL if _m_value is not a copier::entry.  In that case, it
really is a finalized reference attribute, or else it's a special
circular_reference object.

	  if (ref == NULL)
	    // Either really a final reference, or a circular reference.
	    return typename debug_info_entry::const_pointer
	      (dynamic_cast<const value::value_reference *> (_m_value));

That's pending_dwarf::debug_info_entry::const_pointer, which is another
name for pending_dwarf::debug_info_entry::children_type::const_iterator.
This has overloaded constructor methods, one of which takes a
value_reference pointer.  That's the one that calls init_from_ref.

When in fact it is a copier::entry, we instead get here:

	  /* This is an attribute comparison inside the attrs_match
	     comparator.  The attribute sets passed to attrs_match
	     directly don't hit this--they've already been finalized.
	     But following those references we got to another
	     pending_entry and its attributes that are not yet
	     finalized.  If attrs_match winds up returning true, these
	     will never be finalized because they are duplicates.  */
	  return typename debug_info_entry::const_pointer
	    (const_cast<entry *> (ref));

That calls the different overload for that constructor, the one that takes
an entry pointer.

Either way, we get a pending_dwarf::debug_info_entry::const_pointer object.
This is not the same thing as a dwarf_output::debug_info_entry::const_pointer.

> Just to clear this up for my self. The way this works is through the
> following call in copier<input>::pending_entry::final ():
> 
> const debug_info_entry::attributes_type *attrs
>   = &co->add_attributes (_m_tag, _m_attributes, equalator)->first;
> 
> Where _m_attributes is defined as:
> 
> typedef dwarf_data::attributes_type<dwarf_output, value> attr_map;
> attr_map _m_attributes;
> 
> while add_attributes () in elfutils::dwarf_output_collector is defined
> as:
> 
> template<typename match_type>
> const attrs_pair *
> add_attributes (int tag, const attrs_type &candidate, match_type &matcher)
> {
>   return _m_attr_sets.add (std::make_pair (candidate, tag), matcher);
> }

The important thing here is that matcher, which is the equalator object in
the caller above.  _m_attr_sets is a subr::dynamic_equality_set, and
equalator is a pending_entry::attrs_matcher object.  That's a closure
wrapper to contain the copier pointer, and provide:

	inline bool operator () (const attrs_pair &a, const attrs_pair &b)
	{
	  return (a.second == b.second
		  && _m_copier->attrs_match (a.first, b.first));
	}

So what happens in the collector is that copier::attrs_match is called to
compare the candidate new dwarf_output::debug_info_entry::attributes_type
against existing ones in the _m_attr_sets hash table.

An attrs_pair is a pair<int, attributes_type>, where the int is the DIE
tag, so that .second comparison is checking that the DIE tags match before
comparing the actual attribute maps.  This is another layer of distinction
that is probably not really worthwhile.  IIRC it was only motivated by
cutting down the number of attrs_match comparisons among hash-colliding
attributes_type's.  In the old scheme, in <pointer_type type=.../> and
<reference_type type=.../> each has an attributes_type whose hash value is
the same when ... is part of any circularity (which they often are),
because circular references are hashed as 0.  So, at least this cuts in
half the number of hash-colliding attribute_type's to compare against.  But
that was a feeble hack to mitigate the utter inadequacy of the hashing in
the presence of circular references.

> So here I am really a bit stuck. Although I figured out (see previous
> email) how the dwarf_output::attributes_type gets constructed from the
> dwarf_data::attributes_type, I don't fully understand how to "recapture"
> the original references that the copier so diligently kept track of.
> What/where/how would be the best point to extract this info, so that we
> have it when we want to calculate a local hash for a reference attribute
> (which is the local hash of the die pointed to by the attribute)?

Well, this is something that should probably change from how it is now.
When you get as far as add_attributes in the collector, then the attributes
map is already final.  That means that all the attribute values are
"final", but they might still not be "reified".  

So, recall that attributes_type is a map<int, attr_value>.  An attr_value
is nothing but its _m_value pointer, so you can think of this as being a
map<int, value_dispatch *>.  The object pointed to by that rhs is never a
value_dispatch, which is the abstract type, it's a subclass.  For a
reference, that's some kind of value_reference.  When they are non-circular
references, then they are really final and so it's a plain value_reference,
which contains the dwarf_output::debug_info_entry::const_pointer (aka
dwarf_output::debug_info_entry::children_type::const_iterator) that is
really just an iterator into the std::vector<die_info_pair *> that is the
.children () of the referent's (first) parent DIE.  So that's all there is
to know there, it's all done.

For a circular reference, it's actually a circular_reference object, which
is a subclass of value_reference.  That object can be in two states: either
final or not.  When it's final, then it no longer matters that it's a
circular_reference, it is just a value_reference and used no different.
When it's not final, then its base value_reference is not really
initialized.  Instead, its _m_entry is a singleton vector containing a
copier::entry pointer.  This is used when we have a
pending_dwarf::debug_info_entry::children_type::const_iterator (also called
pending_dwarf::debug_info_entry::const_pointer) that wants to point to what
this circular_reference points to.  It's "final", but it's not "placed", so
there is no parent children_type vector with a die_info_pair for it to
point to yet.  Here we get to init_from_ref, which I mentioned way back
above when talking about the pending_dwarf maze.  That sees that the
value_reference in question is really a circular_reference, and that this
circular_reference is not final, and so it uses the iterator into that
singleton vector<entry *>.  Hence, the pending_dwarf::...::const_iterator
wrapper object refers to the copier::entry object rather than to a DIE in
the collector, since it's not placed in the collector yet.

So I don't think that really answered the question of how it should be now,
but that is the pieces of how it is now that I think you were missing.


Thanks,
Roland
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]