This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [patch v7 3/5] x86* unwinder: libdwfl/

From: Mark Wielaard <mjw at redhat dot com>
To: elfutils-devel at lists dot fedorahosted dot org
Date: Tue, 22 Oct 2013 23:31:50 +0200
Subject: Re: [patch v7 3/5] x86* unwinder: libdwfl/
On Mon, 2013-10-21 at 21:19 +0200, Jan Kratochvil wrote: 
> On Sat, 19 Oct 2013 03:05:23 +0200, Mark Wielaard wrote:

> > The only real issue I would like you to
> > think/comment on is around the dwfl_thread_next comments.
> 
> Maybe you want threads to be iterated through callbacks?
> Maybe we should make a separate mail thread for yet another API update.

OK, lets discuss this issue in its own thread. I think the rest is
easily agreed upon (or already known to be just style/nitpicks).

> > > libdwfl/
> > > 2013-10-13  Jan Kratochvil  <jan.kratochvil@redhat.com>
> > > 	    Mark Wielaard  <mjw@redhat.com>
> > 
> > That is a bit too much credit :) You wrote 99.9% of this.
> > I only nitpick.
> 
> I guess I can+should also your Signed-off-by line there.

I meant you wrote it all, I only nitpick. You really don't need to
credit me for the code, just for making your life difficult with all my
nitpicking. Nitpicking-by line would be fine of course :)

> > > 	* Makefile.am (AM_CPPFLAGS): Add ../libasm.
> > 
> > Why?
> 
> Because we need libeblP.h to check 'ebl->class == ELFCLASS32' as AFAIK there
> is no public ebl API to fetch 'ebl->class'.
> 
> We need to know ELFCLASS32 so that we can do:
> dwfl_frame_reg_set:
>   if (ebl->class == ELFCLASS32)
>     val &= 0xffffffff;
> 
> At least I believe if DWARF unwinds some register it should be cut to its real
> hardware width.

Yes, agreed.

> There are various ways how to solve it but none of them find as a clear win:
>  * As is.
>  * New backend ebl method like normalize_reg() which cuts b32..b63 on i386.
>    For example s390 backend later will have normalize_pc() to cut b31.
>  * New ebl_get_class().

We already have the last solution. ebl_get_elfclass ().
I would just call it when constructing the Dwfl_Process and store it
there since we are reusing it every time. Then the above simply becomes
if (process->class == ELFCLASS32)

> > > +/* Set STATE->pc_set from STATE->regs according to the backend.  Return true on
> > > +   success, false on error.  */
> > > +static bool
> > > +state_fetch_pc (Dwfl_Frame *state)
> > > +{
> > > +  switch (state->pc_state)
> > > +    {
> > > +    case DWFL_FRAME_STATE_PC_SET:
> > > +      return true;
> > > +    case DWFL_FRAME_STATE_PC_UNDEFINED:
> > > +      abort ();
> > > +    case DWFL_FRAME_STATE_ERROR:;
> > 
> > trailing ";"
> 
> dwfl_frame.c:45:7: error: a label can only be part of a statement and a declaration is not a statement
> 
> But I guess a { block } is more appropriate there anyway so I have changed it.

aha, missed that. Indeed a block is nicer.

> After this explanation if you still think *_UNINITIALIZED and *_ERROR should be
> separate then I sure can do so.

No, I just found the name confusing. I like the name _UNINITIALIZED
better if it is used for these two meanings, but don't feel too strongly
about it.

> > > +/* Do not call it on your own, to be used by thread_* functions only.  */
> > > +
> > > +static void
> > > +state_free (Dwfl_Frame *state)
> > > +{
> > > +  Dwfl_Thread *thread = state->thread;
> > > +  assert (thread->unwound == state);
> > > +  thread->unwound = state->unwound;
> > > +  free (state);
> > > +}
> > 
> > This is always used as:
> > 
> >   while (thread->unwound)
> >     state_free (thread->unwound);
> > 
> > So just fold that loop into the function and call it thread_state_free
> > or something.
> 
> I wanted to make each object type self-contained, this is destructor of
> Dwfl_Frame.  Mixing it with Dwfl_Thread object seems not right for OO design.
> 
> Do you still want that change?

Since that same pattern is used 5 times in the same file I do think it
makes sense to put it in its own function. I don't care much how it is
called, it just looks cleaner to not repeat the same chunk of code all
over the place. state_free on the other hand is never called on its own,
it always has to be called as in this while pattern. So I just saw it as
a local convenience function, not as a OO design/naming.

> > > +/* Do not call it on your own, to be used by thread_* functions only.  */
> > > +
> > > +static Dwfl_Frame *
> > > +state_alloc (Dwfl_Thread *thread)
> > > +{
> > > [...] 
> > > +}
> > 
> > Comment might be a bit more specific. Only used at the start of
> > dwfl_thread_getframes.
> 
> This is unrelated where this self-contained object is currenly used.
> 
> Do you still want that change?

If you don't like the more specific comment that is fine too.

> > And maybe rename it thread_state_alloc if you
> > rename the above to show they should match up.
> 
> I can rename those but I am lost what design/naming it tracks.
> 
> Maybe I could rather rename s/Dwfl_Frame *state/Dwfl_Frame *frame/ ?

I don't really have a better/great naming suggestion (maybe Dwfl_Frame
*frames might be slightly more accurate since it is a list of frames).
So keep it as you have if that feel better to you.

> > OK. Had to think for a sec why the while (process->thread) loop worked,
> > even after having just read thread_free...
> 
> Anything to change?

No, sorry, was just mumbling in myself about how slow I am comprehending
code :)

> > > +/* Allocate new Dwfl_Process for DWFL.  */
> > > +static void
> > > +process_alloc (Dwfl *dwfl)
> > > +{
> > > +  Dwfl_Process *process = malloc (sizeof (*process));
> > > +  if (process == NULL)
> > > +    return;
> > > +  process->dwfl = dwfl;
> > > +  process->thread = NULL;
> > > +  dwfl->process = process;
> > > +}
> > 
> > I think this would be slightly clearer if it returned the Dwfl_Process
> > pointer (or NULL on failure). But not a big deal.
> 
> Just the caller would have to assert (process == dwfl->process); anyway so
> I found it easier this way.

I would have written it by removing the last line and change the caller
to:

dwfl->process = process_alloc (dwfl);
if (dwfl->process == NULL)
  ...

But the above is fine too, just different styles.

> > > +pid_t
> > > +dwfl_pid (Dwfl *dwfl)
> > > +{
> > > +  if (dwfl->process == NULL)
> > > +    {
> > > +      __libdwfl_seterrno (DWFL_E_NO_ATTACH_STATE);
> > > +      return -1;
> > > +    }
> > > +  return dwfl->process->pid;
> > > +}
> > > +INTDEF(dwfl_pid)
> > 
> > Setting dwfl_errno seems a little redundant since there is no other
> > reason this call can fail. Thought it is harmless I guess.
> 
> I thought the general elfutils public API rule is that if a function fails it
> sets the library's errno.

Yes, the above is fine.

> > > +  if (prev_thread != NULL && prev_thread->next != NULL)
> > > +    return prev_thread->next;
> > > +  Dwfl_Process *process = dwfl->process;
> > > +  if (process == NULL)
> > > +    {
> > > +      __libdwfl_seterrno (DWFL_E_NO_ATTACH_STATE);
> > > +      return NULL;
> > > +    }
> > 
> > hohum, so someone is using a Dwfl_Thread * after calling dwfl_end?
> > Or is there any other way this could happen? Here maybe an assert might
> > be in order instead of returning an error?
> 
> You can get this error when you never call dwfl_attach_state() for DWfl:
> 
> 	Dwfl *dwfl = dwfl_begin (&callbacks);
> 	dwfl_next_thread (dwfl, NULL);

Ah, right, missed that. OK.

> > > +  Dwfl_Thread *nthread = thread_alloc (process, prev_thread);
> > 
> > Wait, don't we have to check prev_thread == NULL && process->thread !=
> > NULL first?
> 
> thread_alloc() always works, both for allocating first thread and the last
> thread.  It asserts you do not pass prev_thread as a non-last one.

hmmm, I don't like that design, it seems to mean we cannot rewalk the
thread list after the first time. But that is the part we would discuss
separately.

> > > +  Dwfl_Error err = dwfl_errno ();
> > > +  assert (thread->thread_detach_needed);
> > > +  if (process->callbacks->thread_detach)
> > > +    process->callbacks->thread_detach (thread, thread->callbacks_arg);
> > > +  thread->thread_detach_needed = false;
> > > +  if (state == NULL || state->pc_state == DWFL_FRAME_STATE_ERROR)
> > > +    {
> > > +      __libdwfl_seterrno (err);
> > > +      return -1;
> > > +    }
> > > +  assert (state->pc_state == DWFL_FRAME_STATE_PC_UNDEFINED);
> > 
> > Why this assert?
> 
> The code above checks if DWFL_FRAME_STATE_PC_SET and later if
> DWFL_FRAME_STATE_ERROR so this point of code should be reached only for the
> case of DWFL_FRAME_STATE_PC_UNDEFINED.  It is both documenting the code and
> also providing a protection against forgotten code path if a new
> DWFL_FRAME_STATE_* value gets defined in the future.

OK.

> > > diff --git a/libdwfl/dwfl_frame_pc.c b/libdwfl/dwfl_frame_pc.c
> > > +bool
> > > +dwfl_frame_pc (Dwfl_Frame *state, Dwarf_Addr *pc, bool *isactivation)
> > > +{
> > > +  assert (state->pc_state == DWFL_FRAME_STATE_PC_SET);
> > > +  *pc = state->pc;
> > > +  if (isactivation)
> > > +    {
> > > +      /* Bottom frame?  */
> > > +      if (state == state->thread->unwound)
> > > +	*isactivation = true;
> > > +      /* *ISACTIVATION is logical or of current and previous frame state.  */
> > > +      else if (state->signal_frame)
> > > +	*isactivation = true;
> > > +      else
> > > +	{
> > > +	  /* Not affected by unsuccessfully unwound frame.  */
> > > +	  __libdwfl_frame_unwind (state);
> > > +	  if (state->unwound == NULL
> > > +	      || state->unwound->pc_state != DWFL_FRAME_STATE_PC_SET)
> > > +	    *isactivation = false;
> > > +	  else
> > > +	    *isactivation = state->unwound->signal_frame;
> > > +	}
> > 
> > I don't understand this last else block part where isactivation depends
> > on the unwound state. Could you add a comment explaining or a reference
> > to some document that defines this part?
> 
> I considered it was documented by these two lines:
> 
> > > +      /* *ISACTIVATION is logical or of current and previous frame state.  */
> > > +     /* Not affected by unsuccessfully unwound frame.  */
> 
> Is this expansion of comments good enough now?
> 
>       /* Bottom frame?  */
>       if (state == state->thread->unwound)
>         *isactivation = true;
>       /* *ISACTIVATION is logical union of whether current or previous frame
>          state is SIGNAL_FRAME.  */
>       else if (state->signal_frame)
>         *isactivation = true;
>       else
>         {
>           /* If the previous frame has unwound unsuccessfully just silently do
>              not consider it could be a SIGNAL_FRAME.  */
>           __libdwfl_frame_unwind (state);
>           if (state->unwound == NULL
>               || state->unwound->pc_state != DWFL_FRAME_STATE_PC_SET)
>             *isactivation = false;
>           else
>             *isactivation = state->unwound->signal_frame;
>         }

I think I just don't fully understand the semantics of the 'S'
augmentation. That is an eh_frame CFI extension, not described in the
DWARF spec. I understand that if this is a signal_frame then the PC
doesn't need to be adjusted. But why don't we have adjust the PC for
this frame if we were called from a signal_frame?

> > > +static int
> > > +bra_compar (const void *key_voidp, const void *elem_voidp)
> > > +{
> > > +  Dwarf_Word offset = (uintptr_t) key_voidp;
> > > +  const Dwarf_Op *op = elem_voidp;
> > > +  return (offset > op->offset) - (offset < op->offset);
> > > +}
> > 
> > Nitpick: Rename bra_compare?
> 
> It is a parameter to bsearch() which calls that parameter 'compar'.
> I have no idea why is that parameter 'compar' but I try to be compliant with it.
> As qsort() also calls it 'compar' it does not seem to be a POSIX typo.

Indeed. How weird. But better to be consistent then. OK.

> > > +/* FIXME: Handle bytecode deadlocks and overflows.  */
> > 
> > In the systemtap runtime unwinder we "cheated" and don't support
> > backward branches/skips (I doubt those are really ever used in CFI).
> > Would that help here?
> 
> I do not find some limit on maximum executed instructions complicated, it just
> was not implemented in this first critical-path only part.
> 
> Skipping backward branches is an interesting idea if you have found it works
> in practice.

It seems to work. It is used in combination with only accepting a limited
number of OPs to limit the amount of instructions executed during runtime.
This way you have a limit up-front (which was convenient to have in the
case of the systemtap runtime unwinder, so we don't even try if we know
it would fail anyway). But a simpler approach of just checking the
number of instructions executed and compare it against some limit would
have worked too (and then we could have allowed backward branches).
> 
> > > +	    /* PPC32 vDSO has such invalid operations.  */
> > 
> > How horrible. Old kernels only?
> 
> IIRC I was looking even at some recent PPC but not too recent PPC as I do not
> have access to too recent PPCs.  It is still definitely needed for RHEL-5+,
> just not sure if it applies for the 'ports' branch, IMO not.
> (That is 'ports' branch is about old hosts, not about old targets.)

Yes, agreed. Lets just keep it.

> > > +      default:
> > > +	__libdwfl_seterrno (DWFL_E_UNSUPPORTED_DWARF);
> > > +	return false;
> > 
> > Which OPs are missing (DW_OP_div and DW_OP_mod it seems, be careful with
> > divide by zero)? Lets make sure we distinquish between OPs that are
> > valid in CFI, but we don't handle and OPs that aren't valid in CFI
> > (INVALID_DWARF).
> 
> Some CFI-valid OPs are really missing, they just were not in the critical-path
> initial post.

IMHO it would be really good if we listed the known missing ones in one
(fall-through) case statement that is UNSUPPORTED_DWARF and make the
default INVALID_DWARF. That way we also have a concrete TODO list.

Though I agree that what there is now seems effectively what is needed.

> > > +static void
> > > +handle_cfi (Dwfl_Frame *state, Dwarf_Addr pc, Dwarf_CFI *cfi)
> > > +{
> > > +  Dwarf_Frame *frame;
> > > +  if (INTUSE(dwarf_cfi_addrframe) (cfi, pc, &frame) != 0)
> > > +    {
> > > +      __libdwfl_seterrno (DWFL_E_LIBDW);
> > > +      return;
> > > +    }
> > > +  Dwfl_Thread *thread = state->thread;
> > > +  Dwfl_Process *process = thread->process;
> > > +  Ebl *ebl = process->ebl;
> > > +  size_t nregs = ebl_frame_nregs (ebl);
> > 
> > This could return zero. Is that a problem here (return error)? Or did we
> > already check earlier (assert > 0)?
> 
> dwfl_thread_getframes() already checks if (ebl_frame_nregs (ebl) == 0)
> and dwfl_thread_getframes() is the only way how to get Dwfl_Frame *.
> Therefore put an assert here.

OK.

> > > +  Dwfl_Frame *unwound;
> > > +  unwound = malloc (sizeof (*unwound) + sizeof (*unwound->regs) * nregs);
> > > +  state->unwound = unwound;
> [...]
> > > +  if (unwound->pc_state == DWFL_FRAME_STATE_ERROR
> > > +      && dwfl_frame_reg_get (unwound, frame->fde->cie->return_address_register,
> > > +			     &unwound->pc))
> > > +    {
> > > +      /* PPC32 __libc_start_main properly CFI-unwinds PC as zero.  Currently
> > > +	 none of the archs supported for unwinding have zero as a valid PC.  */
> > > +      if (unwound->pc == 0)
> > > +	unwound->pc_state = DWFL_FRAME_STATE_PC_UNDEFINED;
> > > +      else
> > > +	unwound->pc_state = DWFL_FRAME_STATE_PC_SET;
> > > +    }
> > > +}
> > 
> > If at the end unwound->pc_state == DWFL_FRAME_STATE_ERROR should we
> > signal an error by setting state->unwound?
> 
> I do not understand you here.  handle_cfi() at its top already does:
> 
>   unwound = malloc (sizeof (*unwound) + sizeof (*unwound->regs) * nregs);
>   state->unwound = unwound;
> 
> So state->unwound->pc_state == DWFL_FRAME_STATE_ERROR which means an error.

OK. I guess I got confused by the states. ERROR == UNINITIALIZED.

> > And the comment here is "check whether this is the initial frame or a
> > signal frame. Then we need to unwind from the adjusted pc."
> 
> Done; just the comment should be exactly the opposite one:
> 
>   /* Check whether this is the initial frame or a signal frame.  
>      Then we need to unwind from the original, unadjusted PC.  */

Yes, indeed. Sorry for the confusion.

> > > +  if (state != state->thread->unwound && ! state->signal_frame)
> > > +    pc--;
> > > +  Dwfl_Module *mod = INTUSE(dwfl_addrmodule) (state->thread->process->dwfl, pc);
> > > +  if (mod != NULL)
> > > +    {
> > > +      Dwarf_Addr bias;
> > > +      Dwarf_CFI *cfi_eh = INTUSE(dwfl_module_eh_cfi) (mod, &bias);
> > > +      if (cfi_eh)
> > > +	{
> > > +	  handle_cfi (state, pc - bias, cfi_eh);
> > > +	  if (state->unwound)
> > > +	    return;
> > > +	}
> > > +      Dwarf_CFI *cfi_dwarf = INTUSE(dwfl_module_dwarf_cfi) (mod, &bias);
> > > +      if (cfi_dwarf)
> > > +	{
> > > +	  handle_cfi (state, pc - bias, cfi_dwarf);
> > > +	  if (state->unwound)
> > > +	    return;
> > > +	}
> > > +    }
> > > +  __libdwfl_seterrno (DWFL_E_NO_DWARF);
> > > +}
> > 
> > handle_cfi might set dwflerrno when state->unwound == NULL, in that case
> > we probably shouldn't override it with the generic DWFL_E_NO_DWARF,
> > might want to put that in the else branch of (mod != NULL)?
> 
> The reason is that while the idea of DWFL_FRAME_STATE_PC_UNDEFINED terminating
> the backtrace is nice in real world it does not happen.  It was (I have) only
> recently fixed it for x86* and non-x86* archs may never be fixed.  So the
> decision is left up to the caller whether DWFL_E_NO_DWARF is considered as
> a proper backtrace terminator.  tests/run-backtrace.sh has for it:
>   # In some cases we cannot reliably find out we got behind _start.
>   if cmp -s <(echo "${abs_builddir}/backtrace: dwfl_thread_getframes: No DWARF information found") <(uniq <$1); then
> 
> There may be multiple different error codes if CFI is not found, at least
> besides DWFL_E_NO_DWARF also DWFL_E_NO_MATCH.  So that DWFL_E_NO_DWARF is
> rather a part of API.

hmmm, that doesn't really work since the DWFL_E_ error codes are private
(libdwflP.h). You can get an error string describing the error, but not
really compare them easily. An error is an error as far as the user is
concerned.

Since we cannot guarantee unwinding finishes cleanly, the user just has
to accept that unwinding often just ends in an error. Ugly, but
unavoidable I am afraid. We will just have to work towards making the
whole toolchain DWARF unwinding "clean" eventually (some far future
dream). At least DWARF4 6.4.4 seems very clear on how to mark the end of
an unwind.

> > Also a nitpick, but might this be slightly nicer to read if this and
> > handle_cfi returned stat->unwound to make the caller error checking more
> > "natural"?
> 
> This is only an internal function and I find more complicated in the caller to
> solve what if the function returned NULL but state->unwound is not NULL etc.
> I would prefer to keep it as is.

OK. It is just a simple style difference.

> > thread could use a comment explaining it is the (currently) last thread.
> > Or maybe rename it to last_thread or cur_thread to make that clear?
> 
> It is the first thread.  This is why callers have to pass that "prev_thread"
> as the last thread is not stored anywhere.
> 
> I have renamed it to "first_thread".

Thanks. And sorry about my confusion again. Clearly it explains why I
was having trouble grokking some of the code :)
> 
> +	if (process->callbacks->memory_read == NULL)
> +	  {
> +	    __libdwfl_seterrno (DWFL_E_LIBDW);
> +	    free (stack);
> +	    return false;
> +	  }
> [...]
> +    {
> +      if (process->callbacks->memory_read == NULL)
> +	{
> +	  __libdwfl_seterrno (DWFL_E_LIBDW);
> +	  return false;
> +	}

DWFL_E_LIBDW isn't the right error code here since that will lookup
dwarf_errno (see dwfl_error.c). DWFL_E_INVALID_ARGUMENT maybe?

Thanks,

Mark
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]