This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: dwarf_edit_output testcase sanity checking


> Yes that is what I meant, but you don't really know which reference
> attributes those are till you resolve them (fully).

Right.  But in the absence of cycles, full resolution is simple.

> Yes, but that isn't a problem I believe. Those DIEs are hash equal in
> the "first pass", but we don't use the result of that for any actual
> comparison. They get their hash "augmented" by the second pass which
> takes the reference attributes into account (using the hash of the first
> pass for the DIE the reference attribute points to).

Ok.  I don't think I really understand your algorithm.  But it's still
early in the morning, and I still have an infection close to my brain.
Perhaps you can write up a clean description/pseudo-code of the whole
algorithm that we can all consider from the ground up.

> Yeah, if you look at it as a fully two pass algorithm it might actually
> work out pretty simple after all. It just takes "a bit" of extra storage
> for all DIEs.

We already have 'struct entry' that lives forever.  Anything that just
increases the constant factor for O(n) in n input DIEs is probably fine.
The algorithmic and storage issues that I suspect matter most are:

* touch each non-reference attribute value only once 
  (aside from hash collisions, and tweak hash functions to keep those rare)
* store each unique non-reference attribute value only once
  (i.e. avoid the memory explosion that dwarf_edit has)
* minimize the temporary life of any kind of children vectors
** ideally never allocate a "pending" children vector that we later
   decide is a duplicate
** when do we, minimize the lifetime so we don't have lots of them
   live at once during the copying
* minimize the temporary life of any attribute maps
** similar

I previously concentrated mostly on trying to collapse pending subtrees
into final form as quickly as possible, so that we would not have a lot of
memory live in "pending" data structures at the same time.  But that may
have been a false economy.  Two passes over the input tree structure where
the first one does not allocate a lot of parallel data tree structures at
all could be far better.

There are different approaches to minimizing the temporary allocation of
pending data structures.  For example, we might split up the "final"
(i.e. collector) storage of attribute maps so that in the first pass we
finalize a data structure that holds all the attribute names and
non-reference values, and then store the references separately.  

e.g., an attributes_type in the collector would be a pointer to a
"terminals map" and an array of DIE pointers.  The "terminals map" would be
a map where the rhs for reference-valued attributes is an "index stub"
object.  An "index stub" is a value_dispatch subclass that just holds an
integer index.  The collector would only need one such object for each
index value, for a total of n objects where n is the largest number of
reference attributes any DIE has.  To follow a reference attribute, you'd
take that index and fetch that slot from the corresponding array.  That
would require some reworking of the C++ hooey, and I haven't thought it all
the way through.  Perhaps we don't need that split in the final storage.
Instead, we could have a hash table of "terminals maps" only in the copier,
and reify them to final attribute maps in the collector (as they are
currently stored) in the second pass.

But I'm getting ahead of myself.  We can think through the storage
organization more concretely after we have worked out the copying algorithm.

> > That sounds nice, but the "structural tree" <singleton_tag/> with "all
> > basic attributes" being the empty set is a common case.
> 
> Maybe I am missing the actual example you have in mind. Could you post
> an example tree that shows it?

I just meant <pointer_type type=.../> and the like (no children,
no non-reference attributes).


Thanks,
Roland

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]