This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH] Add C++ wrappers for iterating through various libdw and libdwfl objects

From: Petr Machata <pmachata at redhat dot com>
To: elfutils-devel at lists dot fedorahosted dot org
Date: Tue, 31 Mar 2015 15:05:11 +0200
Subject: Re: [PATCH] Add C++ wrappers for iterating through various libdw and libdwfl objects
Mark Wielaard <mjw@redhat.com> writes:

> On Fri, 2015-03-27 at 21:32 +0100, Petr Machata wrote:
>> Mark Wielaard <mjw@redhat.com> writes:
>> > On Wed, 2015-03-18 at 00:55 +0100, Petr Machata wrote:
>> The iterators are essentially glorified structs.  The usual C++ ABI
>> concerns don't apply (reordering, adding virtuals).  They only use
>> single inheritance off system classes, and those only serve for tagging
>> (so that algorithms can dispatch based on iterator type).
>
> But won't you have issues when some program uses one version of the
> elfutils iterators and it also uses a library which was compiled against
> an elfutils version that uses a different one? Or can/should/will they
> never mix anyway (some of your answers below hint at that).

Depends on how they change, but in general yes, that would be a problem.
E.g. moving methods around won't be a problem, adding methods won't be a
problem.  Changing fields however would be a problem.

Unlike C, this can't be mitigated by adding a padding field, because the
user inlines both the layout information, and the behavior (all the code
is in headers as well).

The way it's written now, care must be taken that any DSO that uses
these iterators on the API boundary is built against version of the
iterator that's compatible with its clients.

If these types should be safely usable on API boundary of another
library, we would need to move most of them out of the header, and
expose only pure pimpl'd interface.  That would make the interface
somewhat more heavy-weight.

>> >> Error conditions are communicated through exceptions.  For better or
>> >> worse, that's how things are done in C++, and there's simply no way to
>> >> communicate failures out of constructors or the likes of operator++.
>> >> Currently I'm simply throwing an std::runtime_error, but maybe it would
>> >> make sense to have a custom family of libelf, libdw, libdwfl exceptions,
>> >> all rooted in a general elfutils exception type.
>> >
>> > Can we add those later, inheriting from std::runtime_error for example
>> > to keep ABI?
>> 
>> Sure.
>
> Great. I wouldn't mind if you prototyped how that might look in the
> future. But again, see below. I think my idea of exceptions is slightly
> skewed.
>

Something like:

namespace elfutils
{
  class exception
    : public std::exception
  {};

  class libdw_exception
    : public elfutils::exception
  {
    int m_error;
  public:
    explicit libdw_exception (int error) : m_error (error) {}
    virtual const char *what() const
    { return dwarf_errmsg (m_error); }
  };
}

Then constructing libdw_exception itself is super cheap (though actually
throwing and unwinding is of course still expensive), and what() returns
directly the static buffer from inside libdw.

You would similarly have libelf_exception and libdwfl_exception.

>> >> It's all C++98.  We have had "the new C++" out there on the streets
>> >> since 2009 or so, and the iterators presented here were all written with
>> >> varying degrees of C++11 in them.  But that is still sufficiently new
>> >> that I felt adhering to the C++98 subset is the prudent way forward, and
>> >> backported the code.
>> >
>> > Are there any issues building this with C++11? Does that change ABI?
>> 
>> You mean e.g. a third-party C++98 library embedding these classes in
>> public interfaces, and another C++11 application relying on that?
>
> Right. Or the other way around. I assume these iterators might get
> exposed by other libraries build upon elfutils that might get used by
> applications that might use that library and elfutils directly.
>
>> Hmm, I think there shouldn't be.  The GCC folks will move (moved) C++98
>> to the new ABI as well, and GCC should include backward compatible
>> symbols for the old ABI.  I think.
>
> It might not be an issue. Maybe we can just assume that a whole system
> is completely using the gcc c++89 abi or the gcc c++11 abi but never two
> at the same time.

I'm eyeing the list of ABI changes between C++98 and C++11.  I don't see
any that could affect the iterators.  There is "`vector::data()`'s
return type changes from `pointer` to `_Tp*`" but ::data itself is only
available in C++11, so I think this doesn't apply for cross-C++-version
compilation.  (It might maybe apply for C++11 as compiled by pre-4.6 GCC
vs. post-4.6 GCC, but _that_ I'm pretty sure we can safely disregard.)

>> It is a convention in these header files.  It could be named just
>> "impl", or maybe "aux", or something else still, I don't think there is
>> convention to speak of.  Point is somehow to signal to the client that
>> there be lions.
>
> It might be my own confusion. I am still thinking this is exposing an
> abi.

Yeah, in a way.  But if a DSO exposes these auxiliary symbols as public
symbols, that's that DSO's problem, not ours.

But as written above.  Since both behavior and layout are inlined in
clients, clients that end up binary-interacting with each other have to
make sure their understanding of these artifacts is compatible.

>> No.  The CU iterator was written when there were no type units around.
>> dwgrep currently doesn't type units either.  Then again, the name makes
>> it clear that this goes through CU's.  I imagine tu_iterator would be
>> similarly useful, or all-encompassing unit_iterator.  But I haven't
>> written it.
>
> It kind of depends what you call a CU. This goes through compile_units,
> and also through partial_units, but not type_units. DWARF describes
> these as three kinds of compilation units. Normal, partial and type.
>
> I think it is kind of inconsistent to have the iterator go through
> normal and partial units, but not type units. I think normally a user
> wants to go through the normal units and then have a tree iterator that
> follows DW_TAG_import_unit to virtually create the DIE trees. Or they
> might be interested in all type definitions (but don't want to have to
> follow DW_AT_type references) so they want to iterate through normal and
> type units (but not partial units, because they would be walked
> virtually in a tree iterator).
>
> It feels like the iterator is slightly too low-level and/or doesn't give
> enough ways to ask how/what you really want to iterate over. I think a
> unit_iterator where you can select which unit kinds you would like to
> iterate over would be very useful.

OK, I'll work on making this into a unit_iterator.  It will go through
both non-type units and type units.

>> >> +    bool
>> >> +    operator== (cu_iterator const &that) const
>> >> +    {
>> >> +      return m_offset == that.m_offset;
>> >> +    }
>> >
>> > Don't you also need to compare m_dw?
>> > hmmm, that doesn't work with end () then.
>> 
>> Well, you are not supposed to compare iterators coming from different
>> sources, so this doesn't need to work.
>
> Is that common C++ knowledge? How do you define "source"? I assume
> people will expect that if you compare two "wrong/incompatible"
> iterators anything can happen (including crashes)? I guess this just
> shows my inexperience with C++ iterators. Once you hit an "end" iterator
> you always stop processing and won't try comparing against anything.

E.g. libstdc++ has this mode where when you build with -D_GLIBCXX_DEBUG
the following would be diagnosed:

#include <vector>
int main () {
  std::vector <int> a;
  std::vector <int> b;
  a.begin () == b.begin ();
}

$ ./a.out 
/usr/include/c++/4.9.2/debug/safe_iterator.h:510:error: attempt to compare 
    iterators from different sequences.

I think C++11 has a definition of "same source" that goes something
like, given two iterators, they are from the same source if you can take
one of them, ++ it sufficient number of times, and get to the point
where both reference the same object.

>> diff --git a/libdw/c++/libdw b/libdw/c++/libdw
>> index f884d2e..43bdcd5 100644
>> --- a/libdw/c++/libdw
>> +++ b/libdw/c++/libdw
>> @@ -451,7 +451,10 @@ namespace elfutils
>>  
>>      struct cb_data
>>      {
>> +      // The visited attribute.
>>        Dwarf_Attribute *at;
>> +
>> +      // Whether this is second pass through the callback.
>>        bool been;
>>      };
>
> Yes, a quick note on why we need to keep track of the two passes would
> also be appreciated.

There's this:

      // Do a second iteration to find the next offset.

>> >> +inline void
>> >> +elfutils::libdw_impl::throw_libdw (int dwerr)
>> >> +{
>> >> +  if (dwerr == 0)
>> >> +    dwerr = dwarf_errno ();
>> >> +  assert (dwerr != 0);
>> >> +  throw std::runtime_error (dwarf_errmsg (dwerr));
>> >> +}
>> >
>> > This seems expensive (it copies the string doesn't it?).
>> > Which might suggest having an explicit dw_error might be nicer.
>> > Then one can call (or not call) dwarf_errmsg when catching.
>> 
>> It does, but this is error path--you are already losing.
>> 
>> But the custom error class (inherited off std::exception) would make
>> sense.  We could simply expose the error code and do the translation to
>> char const * on-demand in what().
>
> I would like the "lazy" on-demand what() approach.
> But this probably shows my ignorance on how exceptions are used in C++.
> In java exceptions (throw-catch-finally) are often used as normal part
> of control flow, so you want the creation of an exception as cheap as
> possible, because you might not use it except for type matching to get
> into the right catch clause. It looks like in C++ exceptions are much
> less common and normally only used for semi-fatal issues.

Well some people do abuse exceptions for control flow, but idiomatic C++
only uses them for signalling error cases.

>> >> @@ -39,6 +39,9 @@ noinst_LIBRARIES += libdwfl_pic.a
>> >>  
>> >>  pkginclude_HEADERS = libdwfl.h
>> >>  
>> >> +cppdir = $(pkgincludedir)/c++
>> >> +cpp_HEADERS = c++/libdwfl
>> >
>> > Is installing in a c++ subdir common?
>> > Why not in the main include dir?
>> > The file names don't conflict.
>> 
>> I don't care either way.  I did it this way because that's what the
>> Roland's DWARF writer branch had back then.
>
> Isn't there any convention where c++ headers should be installed?
> I just dislike the extra directory name. But if other packages also have
> explicit c++ directories for their c++ headers, then lets use that. If
> not, lets not and just install under $(pkgincludedir).

I'm not aware of a convention.  I'll put it to pkgincludedir.

>> >> +  public:
>> >> +    explicit dwfl_report (Dwfl *dwfl)
>> >> +      : m_dwfl ((assert (dwfl != NULL), dwfl))
>> >> +    {
>> >> +      dwfl_report_begin_add (m_dwfl);
>> >> +    }
>> >
>> > Is assert the best idiom for a NULL pointer check?
>> 
>> So, elsewhere in elfutils we would silently tolerate the NULL and just
>> return a -1 or something.  But in a constructor that's not an option.
>> 
>> We could throw an exception, I suppose.
>
> It might again be my limited C++ experience. If an exception is really
> like an assert in most cases anyway, I don't really care. Just trying to
> make sure I understand C++ idom around NULL pointers.

C++ has the same sort of relationship to NULL pointers as C.  But when
given one in a constructor that needs a valid pointer, what can you do?

This RAII object could have a m_dwfl != NULL check before each action,
but that's suboptimal.  If there's an error, you should fail loud and
clear, not muffle it for as long as you can and keep limping along.

Not that it matters, I'll be removing that class.

Thanks,
Petr
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]