This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [patch v7 3/5] x86* unwinder: libdwfl/


On Tue, 2013-10-22 at 20:58 +0200, Jan Kratochvil wrote:
> > > +      assert (data->d_size == bytes);
> > 
> > Maybe assert is too strong here? Could there be corrupt core files?
> 
> No.  In such case elf_getdata_rawchunk would return NULL.
> 
> I do not think this assert can fail.

Yes, that makes sense. The assert is fine.

> > > +static bool
> > > +core_set_initial_registers (Dwfl_Thread *thread, void *thread_arg_voidp)
> > > +{
> > > +  struct thread_arg *thread_arg = thread_arg_voidp;
> > > +  struct core_arg *core_arg = thread_arg->core_arg;
> > > +  Elf *core = core_arg->core;
> > > +  size_t offset = thread_arg->note_offset;
> > > +  GElf_Nhdr nhdr;
> > > +  size_t name_offset;
> > > +  size_t desc_offset;
> > > +  Elf_Data *note_data = core_arg->note_data;
> > > +  size_t nregs = ebl_frame_nregs (core_arg->ebl);
> > > +  assert (offset < note_data->d_size);
> > > +  size_t getnote_err = gelf_getnote (note_data, offset, &nhdr, &name_offset,
> > > +				     &desc_offset);
> > > +  assert (getnote_err != 0);
> > 
> > Too strong IMHO. Just __libdwfl_seterrno (DWFL_E_LIBELF) return false?
> > core files can be corrupted.
> 
> In such case __libdwfl_attach_state_for_core would already fail the and
> execution would not get here at all.  As the data is already read in memory it
> is IMO really rather an assert().  Or do I miss something?

No, I missed that __libdwfl_attach_state_for_core already does the above
checks and we shouldn't get here if they failed. The asserts are
correct. But maybe add a comment about __libdwfl_attach_state_for_core
has already checked this?

> > > +  /* Do not check NAME for now, help broken Linux kernels.  */
> > > +  const char *name = note_data->d_buf + name_offset;
> > > +  const char *desc = note_data->d_buf + desc_offset;
> > > +  GElf_Word regs_offset;
> > > +  size_t nregloc;
> > > +  const Ebl_Register_Location *reglocs;
> > > +  size_t nitems;
> > > +  const Ebl_Core_Item *items;
> > > +  int core_note_err = ebl_core_note (core_arg->ebl, &nhdr, name, &regs_offset,
> > > +				     &nregloc, &reglocs, &nitems, &items);
> > > +  assert (core_note_err != 0);
> > > +  assert (nhdr.n_type == NT_PRSTATUS);
> > 
> > As above too strong IMHO.
> 
> The same reason also here.

Agreed. OK.

> > > +  const Ebl_Core_Item *item;
> > > +  for (item = items; item < items + nitems; item++)
> > > +    if (strcmp (item->name, "pid") == 0)
> > > +      break;
> > > +  assert (item < items + nitems);
> > > +  pid_t tid;
> > > +  {
> > > +    uint32_t val32 = *(const uint32_t *) (desc + item->offset);
> > > +    val32 = (elf_getident (core, NULL)[EI_DATA] == ELFDATA2MSB
> > > +	     ? be32toh (val32) : le32toh (val32));
> > > +    tid = (int32_t) val32;
> > > +    eu_static_assert (sizeof val32 <= sizeof tid);
> > > +  }
> > > +  assert (tid == INTUSE(dwfl_thread_tid) (thread));
> > 
> > Can this happen with a corrupt core file? Then it is too strong IMHO.
> 
> No.  core_next_thread has already read the same number to figure out the TID.
> Here the same core number is just read a second time.

Right, I see it now. OK. Maybe another comment?

Sorry about being a little paranoid about the asserts.
It really is a pain if they could accidentally happen on bad input. So I
am really just trying to make sure they can/should really never fail.

> > I dunno how (or if it is important) to fix this issue.
> > The dwfl_frame_reg_get check does look irrelevant for other arches.
> > But should be harmless.
> 
> This is an ugly hack which is only relevant PPC.  It could be moved at least
> to the second branch for non-x86* archs.  But I somehow found it too small to
> virtualize it to backends/libebl when it does not hurt other archs.

Yeah, it seems OK.

> > > +static bool
> > > +ptrace_attach (pid_t tid)
> > > +{
> > > +  if (ptrace (PTRACE_ATTACH, tid, NULL, NULL) != 0)
> > > +    return false;
> > > +  /* FIXME: Handle missing SIGSTOP on old Linux kernels.  */
> > 
> > How old are these kernels?
> > If this is before say 2.6.18 kernels (7 years ago now) then I
> > think it isn't too urgent.
> 
> It happens in RHEL-6.  It is only very recent change where PTRACE_ATTACH can
> no longer hang.

What a pain :{

So the problem is that sometimes the kernel forgets to send a SIGSTOP to
the child after the PTRACE_ATTACH? Does the waitpid then wait for ever
or does it just get something else then a SIGSTOP instead? Is there a
workaround?

> > If so, then maybe our memory_read callback really should have a
> > tid or Dwfl_Thread argument to indicate for which thread the memory
> > read was intended.
> 
> It should not, it does not depend on the TID.  It would be wrong API.

I don't think it would be that wrong. The user does know in which thread
context it wants the memory. So why wouldn't it provide that information
if it helps some implementations? But...

> I think the logic is now easy with single 'tid_attached' variable when at most
> one thread can be PTRACE_ATTACH-ed.

Yes, this should work fine.

> > > +  while (strcmp (dirent->d_name, ".") == 0
> > > +	 || strcmp (dirent->d_name, "..") == 0);
> > 
> > I assume readdir makes sure there two are always first?
> 
> I do not see such guarantee in Fedora man 3 readdir and neither in:
> 	http://pubs.opengroup.org/onlinepubs/007908799/xsh/readdir.html

But my question was irrelevant. I misunderstood the code to only check
this the first time. But it is checked each time. OK.

> > > +bool
> > > +internal_function
> > > +__libdwfl_attach_state_for_pid (Dwfl *dwfl, pid_t pid)
> > > +{
> > > +  char dirname[64];
> > > +  int i = snprintf (dirname, sizeof (dirname), "/proc/%ld/task", (long) pid);
> > > +  assert (i > 0 && i < (ssize_t) sizeof (dirname) - 1);
> > 
> > 30 chars is enough if pids are actually long, otherwise 21 should be enough
> > for everybody. (MAX_INT is 2147483647, MAX_LONG is 9223372036854775807.)
> 
> I think at least dirname[32] is required for:
> /proc/-9223372036854775808/taskZ
> 
> Also stack frame is 16-bytes aligned.  It could probably be [32] but I really
> do not think it matters to discuss this.

I like small stack frames, so if 32 could be used that would be nice.
But it was just a nitpick indeed. Keep it as is if that makes sure there
is always enough room.

> @@ -147,18 +171,11 @@ static bool
>  pid_set_initial_registers (Dwfl_Thread *thread, void *thread_arg)
>  {
>    struct pid_arg *pid_arg = thread_arg;
> +  assert (! pid_arg->tid_attached);

Nitpick. I like x != 0 in case a variable isn't a bool to make clear
what we are really checking.

Looks good. Only slightly concerned about the missing SIGSTOP issue.

Thanks,

Mark


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]