This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Add C++ wrappers for iterating through various libdw and libdwfl objects


Mark Wielaard <mjw@redhat.com> writes:

> On Wed, 2015-03-18 at 00:55 +0100, Petr Machata wrote:
>> these iterators originated a couple years back on the dwarflint branch.
>> Back then Roland added some C++ wrappers and we started writing stuff in
>> C++.  I've been lugging them around ever since, first to dwlocstat,
>> which was peeled off dwarflint (and should be coming for review after
>> this bunch), and then to dwgrep.  They are header-only library.
>
> When they are header-only how do we check ABI? I assume users might
> expose something like cu_iterator or cu_info from their own code. Is
> there an easy way to check we won't break ABI between elfutils releases
> for such usage? How or against what would we run libabigail for example?
>
> I do find it a little messy to be honest. It makes it a bit hard to
> quickly get an overview of the interface exposed by the header because
> you have to skip through all the implementation stuff.

The iterators are essentially glorified structs.  The usual C++ ABI
concerns don't apply (reordering, adding virtuals).  They only use
single inheritance off system classes, and those only serve for tagging
(so that algorithms can dispatch based on iterator type).

>> Error conditions are communicated through exceptions.  For better or
>> worse, that's how things are done in C++, and there's simply no way to
>> communicate failures out of constructors or the likes of operator++.
>> Currently I'm simply throwing an std::runtime_error, but maybe it would
>> make sense to have a custom family of libelf, libdw, libdwfl exceptions,
>> all rooted in a general elfutils exception type.
>
> Can we add those later, inheriting from std::runtime_error for example
> to keep ABI?

Sure.

>> One non-iterator-like concept that I included as well is a wrapper
>> around dwfl_report.  It's an RAII class that ends the reporting when it
>> goes out of scope.  As soon as one calls dwfl_report_begin, they assume
>> responsibility to call dwfl_report_end eventually.  But in C++ many
>> error conditions are communicated through exceptions, and making sure
>> these cleanups are not skipped is messy.  RAII as a way to express these
>> responsibilities in C++.
>
> The example here can be extended a little. It isn't immediately clear
> what the advantage is from the two lines. It feels like a wrapper for
> creating a Dwfl is missing. But maybe it is natural to mix-and-match the
> C libdwfl.h and c++/libdwfl calls. Expanding the example might show why
> it is really cool. Maybe one that wraps everything in a function or
> block to show the RAII and cleanup triggering?

How about these two?

  //
  // Example usage:
  // {
  //   Dwfl_Module *mod;
  //   {
  //     elfutils::dwfl_report r (dwfl);
  //     mod = r.report_offline (fn, fn, fd);
  //     assert (mod != NULL);
  //   }
  //   work_with (mod);
  // }
  //
  // Or:
  // {
  //   elfutils::dwfl_report r (dwfl);
  //   for (...; it; ...)
  //     {
  //       // open file at *it, potentially throw exception
  //       r.report_offline (fn, fn, fd);
  //     }
  // }

The value added by the wrapper is not great.  It abstracts away one NULL
check, and makes sure that dwfl_report_end is called even if you get
exceptions from wherever (e.g. std::bad_alloc).

But since it needs a file descriptor, which can't be RAII'd (because
close can signal errors), there would be enough silly dances around the
interface anyway.  So it's questionable whether it's as useful as I
thought when extracting it from dwlocstat.

>> It's all C++98.  We have had "the new C++" out there on the streets
>> since 2009 or so, and the iterators presented here were all written with
>> varying degrees of C++11 in them.  But that is still sufficiently new
>> that I felt adhering to the C++98 subset is the prudent way forward, and
>> backported the code.
>
> Are there any issues building this with C++11? Does that change ABI?

You mean e.g. a third-party C++98 library embedding these classes in
public interfaces, and another C++11 application relying on that?

Hmm, I think there shouldn't be.  The GCC folks will move (moved) C++98
to the new ABI as well, and GCC should include backward compatible
symbols for the old ABI.  I think.

>> The headers are installed unconditionally, but a test for availability
>> of a C++ compiler is made anyway, and build of the corresponding test is
>> skipped if there is none.  The testing is rather light, including only a
>> couple cases that I wanted to make sure about as I reviewed the code.
>> Dwgrep has a fairly comprehensive test suite that covers proper function
>> of these iterators, if that's any consolation.
>
> I think I would prefer to only install them when we are able to test
> them. And if people don't want them we could have a --disable-c++
> configure option that skips them. Should packagers be encouraged to
> package them in a separate elfutils-c++-devel package?

That's what I would do.

>> +namespace elfutils
>> +{
>> +  // Helper functions.  Not for client consumption.
>> +  namespace libdw_impl
>> +  {
>> +    inline void throw_libdw (int dwerr = 0);
>> +    inline bool dwpp_child (Dwarf_Die &die, Dwarf_Die &result);
>> +    inline bool dwpp_siblingof (Dwarf_Die &die, Dwarf_Die &result);
>> +    inline Dwarf_Die dwpp_offdie (Dwarf *dbg, Dwarf_Off offset);
>> +  }
>
> So this is just a convention?
> Any namespace ending in _impl isn't really part of the API?

It is a convention in these header files.  It could be named just
"impl", or maybe "aux", or something else still, I don't think there is
convention to speak of.  Point is somehow to signal to the client that
there be lions.

>> +  // Helper structure with data about each compile unit.
>> +  struct cu_info
>> +  {
>> +    Dwarf_Die cudie;
>> +    Dwarf_Off abbrev_offset;
>> +    uint8_t address_size;
>> +    uint8_t offset_size;
>> +  };
>
> This is all useful information that we get from dwarf_nextcu () anyway,
> so it is cheap to provide. But don't we also want to provide any type
> unit information? I see the code uses dwarf_nextcu () and dwarf_offdie
> () so it cannot really handle type units (as long as they are in
> separate sections, DWARF5 proposes to merge them into the
> main .debug_info instead of the separate .debug_type). Was it a
> deliberate choice to ignore type units?

No.  The CU iterator was written when there were no type units around.
dwgrep currently doesn't type units either.  Then again, the name makes
it clear that this goes through CU's.  I imagine tu_iterator would be
similarly useful, or all-encompassing unit_iterator.  But I haven't
written it.

>> +  class cu_iterator
>> +    : public std::iterator <std::input_iterator_tag, cu_info>
>> +  {
>> +    friend class die_tree_iterator;
>> +    Dwarf *m_dw;
>> +    Dwarf_Off m_offset;
>> +    Dwarf_Off m_old_offset;
>> +    cu_info m_info;
>> +
>> +    explicit cu_iterator (Dwarf_Off off)
>> +      : m_dw (NULL)
>> +      , m_offset (off)
>> +      , m_old_offset (0)
>> +    {}
>
> A NULL m_dw seems dangerous/sloppy.

This is for constructing end iterator.  The constructor is private.

>> +  public:
>> +    explicit cu_iterator (Dwarf *dw)
>> +      : m_dw (dw)
>> +      , m_offset (0)
>> +      , m_old_offset (0)
>> +    {
>> +      move ();
>> +    }
>
> Just a simple c++ question. What kind of stuff would convert as argument
> if we don't use explicit here?

Dwarf CV-* only I think.  And things that have operator Dwarf CV-*.
nullptr_t in C++11.  The reason I add explicit is that I don't want
Dwarf *'s implicitly convert to iterators.  I want to see the iterator
being constructed in the code.

>> +    // Construct a CU iterator for DW such that it points to a compile
>> +    // unit represented by CUDIE.
>> +    cu_iterator (Dwarf *dw, Dwarf_Die cudie)
>> +      : m_dw (dw)
>> +      , m_offset (dwarf_dieoffset (&cudie) - dwarf_cuoffset (&cudie))
>> +      , m_old_offset (0)
>> +    {
>> +      move ();
>> +    }
>
> And why don't we need explicit here?

It's two-argument constructor.  Technically we could add explicit if we
want to disable the brace-initialization in C++11.

>> +    bool
>> +    operator== (cu_iterator const &that) const
>> +    {
>> +      return m_offset == that.m_offset;
>> +    }
>
> Don't you also need to compare m_dw?
> hmmm, that doesn't work with end () then.

Well, you are not supposed to compare iterators coming from different
sources, so this doesn't need to work.

Though there could be something like this for paranoia's sake:

  if (m_dw != NULL && that.m_dw != NULL)
    assert (m_dw == that.m_dw);

>> +    cu_iterator
>> +    operator++ (int)
>> +    {
>> +      cu_iterator tmp = *this;
>> +      ++*this;
>> +      return tmp;
>> +    }
>
> This is just idiom for implementing it++?

Yeah.

> While the above is the ++it implementation?
> (I don't understand the (int))

As discussed on IRC, it's language "hack" to distinguish between prefix
and postfix operator++'s.

>> +    Dwarf_Off
>> +    offset () const
>> +    {
>> +      return m_old_offset;
>> +    }
>
> Why do we explicitly expose this? Is this more convenient than having it
> as part of cu_info? And does the user really care about this offset? It
> could also just be queried with dwarf_cu_die for example.

The bare offset does come up in eu-readelf.  But you are right that so
does the CU DIE offset, so this is not a key piece of information.  We
can drop the interface.  (dwgrep uses it, but it's not essential.)

>
>> +    // N.B. see top of the file for explanation of non-constness of
>> +    // operators * and ->.
>
> That holds for Dwarf_Die. But can't/shouldn't we return a const cu_info?
> Or does that not make sense if some fields are non-const anyway?

const cu_info would imply const cu_info::cudie.  Same situation.

>> +    Dwarf_Die &
>> +    operator* ()
>> +    {
>> +      assert (*this != end ());
>> +      return m_die;
>> +    }
>
> Is assert the best idiom to use here?
> Isn't there a runtime exception for this?

Dereferencing end iterator is contract violation.  Anything goes.  Here
it's easy to detect, so we choose to fail loud and fast.

>> +    Dwarf_Die *
>> +    operator-> ()
>> +    {
>> +      return &**this;
>> +    }
>> +  };
>
> Just thinking of naming here. I assume we would also want to provide a
> child iterator that transparently handles import_unit/partial_units.
> Given that often that is the one people might want to use instead of
> having to deal with "raw" DIE trees. How would we call that one?

cooked_child_iterator of course! (Just kidding.)

This sort of stuff is usually handled by policy classes.  Policies are
template arguments (often template template arguments) that specify some
detail of behavior of the primary template.  The child_iterator would
then be configured by a policy class that would or would not do the
inlining.

But that's an overkill, since there are only two variants.

So, um... importing_child_iterator?

It could also be a flag in template argument that defaults to raw mode.
So elfutils::child_iterator<elfutils::import_partial_units>.

We could future-proof it by doing something like:

namespace elfutils
{
  enum partial_import_policy
  {
    raw,
  };

  template <partial_import_policy ImportPol = elfutils::raw>
  class child_iterator /* ... */;
}

Then later you could add new policies to the enum as they are
implemented.

>> +    die_tree_iterator (die_tree_iterator const &that)
>> +      : m_cuit (that.m_cuit)
>> +      , m_stack (that.m_stack)
>> +      , m_die (that.m_die)
>> +    {}
>
> No way to walk the type units?

No.

>> +    die_tree_iterator
>> +    operator++ ()
>> +    {
>> +      Dwarf_Die child;
>> +      if (elfutils::libdw_impl::dwpp_child (m_die, child))
>> +	{
>> +	  m_stack.push_back (dwarf_dieoffset (&m_die));
>> +	  m_die = child;
>> +	  return *this;
>> +	}
>> +
>> +      do
>> +	if (elfutils::libdw_impl::dwpp_siblingof (m_die, m_die))
>> +	  return *this;
>> +	else
>> +	  // No sibling found.  Go a level up and retry, unless this
>> +	  // was a sole, childless CU DIE.
>> +	  if (! m_stack.empty ())
>> +	    {
>> +	      m_die = elfutils::libdw_impl::dwpp_offdie (m_cuit.m_dw,
>> +							 m_stack.back ());
>> +	      m_stack.pop_back ();
>> +	    }
>> +      while (! m_stack.empty ());
>> +
>> +      m_die = (++m_cuit)->cudie;
>> +      return *this;
>> +    }
>
> If we had a "logical cu child iterator" would there be a way to extend
> this to use either the logical child or low level DIE tree child
> iterator instead of direct dwpp_siblingof/dwpp_offdie calls?

I should check out boost iterators, this whole thing looks like ripe for
grand schemes like generic tree iterator specialized by iterator type.

In the mean time, we could either adopt the same approach as with
child_iterator (viz. the policy enum) that the iterator would use as
necessary.  Or go fully abstract and have child iterator as one of the
template parameters.

>> +    // Return a CU iterator representing a CU that the current DIE
>> +    // comes from.
>> +    cu_iterator
>> +    cu () const
>> +    {
>> +      return m_cuit;
>> +    }
>
> This interface might make it hard to also support "logical tree walking"
> if we want to also handle import_unit/partial_unit. Or we could just say
> "this always returns the low-level DIE tree CU".

It's not that important.  I use it for progress reports in dwlocstat,
but that can use some other trick.

>> +  };
>> +
>> +  // An attribute iterator goes through attributes of a given DIE.
>> +  class attr_iterator
>> +    : public std::iterator <std::input_iterator_tag, Dwarf_Attribute>
>> +  {
>> +    Dwarf_Die *m_die;
>> +    Dwarf_Attribute m_at;
>> +    ptrdiff_t m_offset;
>> +
>> +    struct cb_data
>> +    {
>> +      Dwarf_Attribute *at;
>> +      bool been;
>> +    };
>
> It would be nice to document these fields/structures a little to make
> reading the implementation easier for noobs like me.

How about:

diff --git a/libdw/c++/libdw b/libdw/c++/libdw
index f884d2e..43bdcd5 100644
--- a/libdw/c++/libdw
+++ b/libdw/c++/libdw
@@ -451,7 +451,10 @@ namespace elfutils
 
     struct cb_data
     {
+      // The visited attribute.
       Dwarf_Attribute *at;
+
+      // Whether this is second pass through the callback.
       bool been;
     };
 

>> +    bool
>> +    operator== (attr_iterator const &other) const
>> +    {
>> +      return m_offset == other.m_offset
>> +	&& m_at.code == other.m_at.code;
>> +    }
>
> Don't you need to compare m_die too?

As before, you are not supposed to compare totally different
attr_iterator's with each other, and this way comparison vs. the end
iterator (the usual case) is cheap.

>> +inline void
>> +elfutils::libdw_impl::throw_libdw (int dwerr)
>> +{
>> +  if (dwerr == 0)
>> +    dwerr = dwarf_errno ();
>> +  assert (dwerr != 0);
>> +  throw std::runtime_error (dwarf_errmsg (dwerr));
>> +}
>
> This seems expensive (it copies the string doesn't it?).
> Which might suggest having an explicit dw_error might be nicer.
> Then one can call (or not call) dwarf_errmsg when catching.

It does, but this is error path--you are already losing.

But the custom error class (inherited off std::exception) would make
sense.  We could simply expose the error code and do the translation to
char const * on-demand in what().

>> @@ -39,6 +39,9 @@ noinst_LIBRARIES += libdwfl_pic.a
>>  
>>  pkginclude_HEADERS = libdwfl.h
>>  
>> +cppdir = $(pkgincludedir)/c++
>> +cpp_HEADERS = c++/libdwfl
>
> Is installing in a c++ subdir common?
> Why not in the main include dir?
> The file names don't conflict.

I don't care either way.  I did it this way because that's what the
Roland's DWARF writer branch had back then.

>> +    bool
>> +    operator== (const dwfl_module_iterator &that) const
>> +    {
>> +      assert (m_dwfl == NULL || that.m_dwfl == NULL || m_dwfl == that.m_dwfl);
>> +      return m_offset == that.m_offset;
>> +    }
>
> Why that assert? Why not include it in the check?

The contract is that you only compare iterators that come from the same
collection.

>> +  public:
>> +    explicit dwfl_report (Dwfl *dwfl)
>> +      : m_dwfl ((assert (dwfl != NULL), dwfl))
>> +    {
>> +      dwfl_report_begin_add (m_dwfl);
>> +    }
>
> Is assert the best idiom for a NULL pointer check?

So, elsewhere in elfutils we would silently tolerate the NULL and just
return a -1 or something.  But in a constructor that's not an option.

We could throw an exception, I suppose.

Thanks,
Petr

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]