This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: FYI unwinder: tests/ update

From: Mark Wielaard <mjw at redhat dot com>
To: elfutils-devel at lists dot fedorahosted dot org
Date: Tue, 12 Nov 2013 21:11:43 +0100
Subject: Re: FYI unwinder: tests/ update
On Fri, 2013-11-01 at 20:47 +0100, Jan Kratochvil wrote:
> as you requested I have commented more the tests/ part.
> 
> It is in: jankratochvil/unwindx86

I took the version from that branch.

As a side note, I see it depends on new conditional BIARCH.
It would probably be a good idea if configure warned when detecting a
64bit host, that didn't support biarch. So the user knows some tests
won't be run.

Although if I am reading the patch correct the backtrace-child-biarch
test program is still being build in that case. But then it is just the
host arch? Yes, it is, there is a comment in run-backtrace.sh that says
so.

> tests/
> 2013-09-02  Jan Kratochvil  <jan.kratochvil@redhat.com>
> 
> 	* Makefile.am (check_PROGRAMS): Add backtrace, backtrace-child and
> 	backtrace-data.
> 	(BUILT_SOURCES, clean-local, backtrace-child-biarch): New.
> 	(TESTS): Add run-backtrace.sh.
> 	(backtrace_LDADD, backtrace_child_CFLAGS, backtrace_child_LDFLAGS)
> 	(backtrace_data_LDADD): New.
> 	* backtrace-child.c: New file.
> 	* backtrace-data.c: New file.
> 	* backtrace.c: New file.
> 	* run-backtrace.sh: New file.

A little explanation of the general design of the test(s) would be
appreciated. Just something in run-backtrace.sh that explains what
program is called to do what, which signals are send to trigger which
actions and which parts are generic and which are x86 specific.

Also in general I think this is a somewhat giant testcase. Although good
to test the whole system, it is somewhat hard to figure out why/where
something fails. For example on my current system run-backtrace.sh
sometimes FAILs. But I cannot easily figure out why, whether it is just
some corner case or something fundamental. And it is hard to see which
parts are actually used. I don't really understand the need for
backtrace-data.

I wrote down some notes below while reviewing. Maybe you can add them
somewhere, if they are correct.

> diff --git a/tests/Makefile.am b/tests/Makefile.am
> [...]
> +BUILT_SOURCES = backtrace-child-biarch
> +
> +clean-local:
> +	$(RM) backtrace-child-biarch
> +
> +# Substitute $(COMPILE).
> +backtrace-child-biarch: backtrace-child.c
> +	$(CC_BIARCH) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) \
> +		     $(AM_CPPFLAGS) $(CPPFLAGS) \
> +		     $(AM_CFLAGS) $(CFLAGS) $(backtrace_child_CFLAGS) \
> +		     $(AM_LDFLAGS) $(LDFLAGS) $(backtrace_child_LDFLAGS) \
> +		     -o $@ $<

Does CC_BIARCH also take care of any include or library flags for 32bit?
Or is it assumed that all it takes is adding -m32?

OK, testing on at least fedora and debian seems to confirm you don't
need anything else as long as you have to right multilib/32bit
compiler/libs installed (gcc-multilib and libc6-dev-i386 on debian).

> +backtrace_LDADD = $(libdw) $(libelf) $(libmudflap)
> +backtrace_child_CFLAGS = -fPIE
> +backtrace_child_LDFLAGS = -pie -pthread
> +backtrace_data_LDADD = $(libdw) $(libelf) $(libmudflap)

Is there a reason for building backtrace_child pie? Just another test factor?
Why not build backtrace-child-biarch pie?

> +++ b/tests/backtrace-child.c

The code looks OK, but it is somewhat hard to decipher what it really
does and why. Here are my notes.

backtrace-child.c will be generated once or twice. A native-host version
(likely 64bit) and a biarch version (likely 32bit, but maybe also
64bit). It can be called with three options: --ptraceme, --gencore or
--run.

It will always first call three dummy functions that are empty no clone,
no inline asm volatile ("") functions. (why?) Then it will create a new
thread that calls a function "start" which will call a function
"backtracegen" which will call a vararg no-return function "stdarg" that
will install a SIGUSR2 signal handler ("sigusr2").

If --ptraceme is given both the main and start thread will call ptrace
traceme.

If --gencore is given the main thread call pthread_join on the start
thread. The start thread will call the sigusr2 handler (directly, not
through raising a signal), which will then call abort().

If --run is given the main thread will raise (SIGUSR2) and the start
thread will, depending on whether it is build on x86_64 either raise
SIGUSR1 or call the sigusr2 handler directly (without raising the
signal). The sigusr2 handler will then raise SIGUSR1 itself.
(This is really confusing, why the difference between x86_64 and the
rest?)

There is also an assumption that if the process is build on x86_64 and
ptraced and a SIGUSR1 signal is caught by the tracee that the PC will be
adjusted to call the sigusr2 handler (this is also makes this test case
somewhat hard to follow IMHO).

There are a couple of aborts in this code which I am unsure of whether
they are meant to be ever called/reached or not. A comment would be nice
to tell apart the abort () calls that are meant to actually generate the
core file and those that are there to make sure the test case fails in
an erroneous state.

> +/* Test child for parent backtrace test.
> [...]
> +/* Execution will arrive here from jmp by an artificial ptrace-spawn signal.  */
> +
> +static void
> +sigusr2 (int signo)
> +{
> +  assert (signo == SIGUSR2);
> +  if (! gencore)
> +    raise (SIGUSR1);
> +
> +  /* Catch the .plt jump, it will come from this abort call.  */
> +  abort ();
> +}

Could you explain the .plt part a bit more. This seems to be the most
tricky part since it seems to rely on a fairly (2 years) recent
binutils. Is it specific to how you call abort here? Or is it in general
always an issue with how core files are generated that they will end up
with the thread that caused the core to be generated to go through
the .plt?

It is fine for a test to depend on a good toolchain and FAIL if the
toolchain produces broken/missing frame data. But if at all possible
lets split that out in its own test.

> +++ b/tests/backtrace-data.c

I skipped this for now since I don't understand how it is run/tested.

> +++ b/tests/backtrace.c
> [...]
> +/* Older systems (such as RHEL-5; fixed as binutils PR ld/12570 on 2011-06-20)
> +   will fail this testcase as they do not provide CFI for the .plt section.
> +   elfutils 'portable' patch/branch disables the .plt unwind test.  */

It would be nice to add how they fail. So the user can see if it is
this .plt case or something else. And as mentioned above can we split
out this .plt corner case in its own test?

> +static void
> +report_pid (Dwfl *dwfl, pid_t pid)
> +{
> +  int result = dwfl_linux_proc_report (dwfl, pid);
> +  if (result < 0)
> +    error (2, 0, "dwfl_linux_proc_report: %s", dwfl_errmsg (-1));
> +  else if (result > 0)
> +    error (2, result, "dwfl_linux_proc_report");
> +
> +  if (dwfl_report_end (dwfl, NULL, NULL) != 0)
> +    error (2, 0, "dwfl_report_end: %s", dwfl_errmsg (-1));
> +}

OK.

> +static Dwfl *
> +pid_to_dwfl (pid_t pid)
> +{
> +  static char *debuginfo_path;
> +  static const Dwfl_Callbacks proc_callbacks =
> +    {
> +      .find_debuginfo = dwfl_standard_find_debuginfo,
> +      .debuginfo_path = &debuginfo_path,
> +
> +      .find_elf = dwfl_linux_proc_find_elf,
> +    };
> +  Dwfl *dwfl = dwfl_begin (&proc_callbacks);
> +  if (dwfl == NULL)
> +    error (2, 0, "dwfl_begin: %s", dwfl_errmsg (-1));
> +  report_pid (dwfl, pid);
> +  return dwfl;
> +}

OK. nitpick call this dwfl_from_pid.

> +static const char *executable;
> +
> +static int
> +find_elf (Dwfl_Module *mod, void **userdata, const char *modname,
> +	  Dwarf_Addr base, char **file_name, Elf **elfp)
> +{
> +  if (executable && modname != NULL
> +      && (strcmp (modname, "[exe]") == 0 || strcmp (modname, "[pie]") == 0))
> +    {
> +      char *executable_dup = strdup (executable);
> +      if (executable_dup)
> +	{
> +	  free (*file_name);
> +	  *file_name = executable_dup;
> +	  return -1;
> +	}
> +    }
> +  return dwfl_build_id_find_elf (mod, userdata, modname, base, file_name, elfp);
> +}

Is this still needed now that dwfl_core_file_report takes an executable?

> +static Dwfl *
> +dwfl_offline (void)
> +{
> +  static char *debuginfo_path;
> +  static const Dwfl_Callbacks offline_callbacks =
> +    {
> +      .find_debuginfo = dwfl_standard_find_debuginfo,
> +      .debuginfo_path = &debuginfo_path,
> +
> +      .section_address = dwfl_offline_section_address,
> +
> +      /* We use this table for core files too.  */
> +      .find_elf = find_elf,
> +    };
> +  Dwfl *dwfl = dwfl_begin (&offline_callbacks);
> +  if (dwfl == NULL)
> +    error (2, 0, "dwfl_begin: %s", dwfl_errmsg (-1));
> +  return dwfl;
> +}

OK.

> +static Dwfl *
> +report_corefile (Dwfl *dwfl, const char *corefile)
> +{
> +  int fd = open64 (corefile, O_RDONLY);
> +  if (fd == -1)
> +    error (2, 0, "open64: %m");
> +  Elf *elf = elf_begin (fd, ELF_C_READ_MMAP_PRIVATE, NULL);
> +  if (elf == NULL)
> +    error (2, 0, "elf_begin: %s", elf_errmsg (-1));
> +  if (dwfl_core_file_report (dwfl, elf, executable) < 0)
> +    error (2, 0, "dwfl_core_file_report: %s", dwfl_errmsg (-1));
> +  if (dwfl_report_end (dwfl, NULL, NULL) != 0)
> +    error (2, 0, "dwfl_report_end: %s", dwfl_errmsg (-1));
> +  /* FD and ELF are leaked.  */
> +  return dwfl;
> +}

OK.

> +static Dwfl *
> +corefile_to_dwfl (const char *corefile)
> +{
> +  return report_corefile (dwfl_offline (), corefile);
> +}

OK. nitpick call it dwfl_from_corefile?

> +static int
> +dump_modules (Dwfl_Module *mod, void **userdata __attribute__ ((unused)),
> +	      const char *name, Dwarf_Addr start,
> +	      void *arg __attribute__ ((unused)))
> +{
> +  Dwarf_Addr end;
> +  dwfl_module_info (mod, NULL, NULL, &end, NULL, NULL, NULL, NULL);
> +  printf ("%#" PRIx64 "\t%#" PRIx64 "\t%s\n", (uint64_t) start, (uint64_t) end,
> +	  name);
> +  return DWARF_CB_OK;
> +}

OK.

> +typedef void (callback_t) (pid_t tid, unsigned frameno, Dwarf_Addr pc,
> +			   const char *symname, Dwfl *dwfl, void *data);
> +
> +struct frame_callback
> +{
> +  unsigned frameno;
> +  callback_t *callback;
> +  void *callback_data;
> +};
> +
> +static int
> +frame_callback (Dwfl_Frame *state, void *frame_arg)
> +{
> +  struct frame_callback *data = frame_arg;
> +  Dwarf_Addr pc;
> +  bool isactivation;
> +  if (! dwfl_frame_pc (state, &pc, &isactivation))
> +    {
> +      error (0, 0, "%s", dwfl_errmsg (-1));
> +      return DWARF_CB_ABORT;
> +    }
> +  Dwarf_Addr pc_adjusted = pc - (isactivation ? 0 : 1);
> +
> +  /* Get PC->SYMNAME.  */
> +  Dwfl_Thread *thread = dwfl_frame_thread (state);
> +  Dwfl *dwfl = dwfl_thread_dwfl (thread);
> +  Dwfl_Module *mod = dwfl_addrmodule (dwfl, pc_adjusted);
> +  const char *symname = NULL;
> +  if (mod)
> +    symname = dwfl_module_addrname (mod, pc_adjusted);
> +
> +  printf ("#%2u %#" PRIx64 "%4s\t%s\n", data->frameno, (uint64_t) pc,
> +	  ! isactivation ? "- 1" : "", symname);
> +  pid_t tid = dwfl_thread_tid (thread);
> +  if (data->callback)
> +    data->callback (tid, data->frameno, pc, symname, dwfl, data->callback_data);
> +  data->frameno++;
> +
> +  return DWARF_CB_OK;
> +}
> +
> +struct thread_callback
> +{
> +  callback_t *callback;
> +  void *callback_data;
> +};
> +
> +static int
> +thread_callback (Dwfl_Thread *thread, void *thread_arg)
> +{
> +  struct thread_callback *data = thread_arg;
> +  printf ("TID %ld:\n", (long) dwfl_thread_tid (thread));
> +  struct frame_callback frame_callback_data;
> +  frame_callback_data.frameno = 0;
> +  frame_callback_data.callback = data->callback;
> +  frame_callback_data.callback_data = data->callback_data;
> +  switch (dwfl_thread_getframes (thread, frame_callback,
> +				 &frame_callback_data))
> +    {
> +    case 0:
> +      break;
> +    case DWARF_CB_ABORT:
> +      return DWARF_CB_ABORT;
> +    case -1:
> +      error (0, 0, "dwfl_thread_getframes: %s", dwfl_errmsg (-1));
> +      /* All platforms do not have yet proper unwind termination.  */
> +      break;
> +    default:
> +      abort ();
> +    }
> +  return DWARF_CB_OK;
> +}
> +
> +static void
> +dump (pid_t pid, const char *corefile, callback_t *callback,
> +      void *callback_data)
> +{
> +  Dwfl *dwfl;
> +  if (pid && !corefile)
> +    dwfl = pid_to_dwfl (pid);
> +  else if (corefile && !pid)
> +    dwfl = corefile_to_dwfl (corefile);
> +  else
> +    abort ();
> +  ptrdiff_t ptrdiff = dwfl_getmodules (dwfl, dump_modules, NULL, 0);
> +  assert (ptrdiff == 0);
> +  struct thread_callback thread_callback_data;
> +  thread_callback_data.callback = callback;
> +  thread_callback_data.callback_data = callback_data;
> +  bool err = false;
> +  switch (dwfl_getthreads (dwfl, thread_callback, &thread_callback_data))
> +    {
> +    case 0:
> +      break;
> +    case DWARF_CB_ABORT:
> +      err = true;
> +      break;
> +    case -1:
> +      error (0, 0, "dwfl_getthreads: %s", dwfl_errmsg (-1));
> +      err = true;
> +      break;
> +    default:
> +      abort ();
> +    }
> +  if (callback)
> +    callback (0, 0, 0, NULL, dwfl, callback_data);
> +  dwfl_end (dwfl);
> +  if (err)
> +    exit (EXIT_FAILURE);
> +}
> +
> +struct see_exec_module
> +{
> +  Dwfl_Module *mod;
> +  char selfpath[PATH_MAX + 1];
> +};
> +
> +static int
> +see_exec_module (Dwfl_Module *mod, void **userdata __attribute__ ((unused)),
> +		 const char *name __attribute__ ((unused)),
> +		 Dwarf_Addr start __attribute__ ((unused)), void *arg)
> +{
> +  struct see_exec_module *data = arg;
> +  if (strcmp (name, data->selfpath) != 0)
> +    return DWARF_CB_OK;
> +  assert (data->mod == NULL);
> +  data->mod = mod;
> +  return DWARF_CB_OK;
> +}

OK. I wonder if we could/should provide a helper function directly in
libdwfl for this. I guess more people will try something like this and
we can probably do this more efficiently/cache the result.

> +static void
> +selfdump_callback (pid_t tid, unsigned frameno, Dwarf_Addr pc,
> +		   const char *symname, Dwfl *dwfl, void *data)
> +{
> +  pid_t check_tid = (intptr_t) data;
> +  bool disable = check_tid < 0;
> +  if (disable)
> +    check_tid = -check_tid;
> +  static bool seen_main = false;
> +  if (symname && strcmp (symname, "main") == 0)
> +    seen_main = true;
> +  if (pc == 0)
> +    {
> +      assert (seen_main);
> +      return;
> +    }
> +  if (disable || tid != check_tid)
> +    return;

Maybe add a comment?

// For the main thread we are only interested if we can unwind till
// we see the "main" symbol.

> +  Dwfl_Module *mod;
> +  const char *symname2 = NULL;
> +  switch (frameno)
> +  {
> +    case 0:
> +      /* .plt has no symbols.  */
> +      assert (symname == NULL);
> +      break;
> +    case 1:
> +      assert (symname != NULL && strcmp (symname, "sigusr2") == 0);
> +      break;
> +    case 2:
> +      /* __restore_rt - glibc maybe does not have to have this symbol.  */
> +      break;
> +    case 3:
> +      /* Verify we trapped on the very first instruction of jmp.  */
> +      assert (symname != NULL && strcmp (symname, "jmp") == 0);
> +      mod = dwfl_addrmodule (dwfl, pc - 1);
> +      if (mod)
> +	symname2 = dwfl_module_addrname (mod, pc - 1);
> +      assert (symname2 == NULL || strcmp (symname2, "jmp") != 0);
> +      break;
> +    case 4:
> +      assert (symname != NULL && strcmp (symname, "stdarg") == 0);
> +      break;
> +    case 5:
> +      /* Verify we trapped on the very last instruction of child.  */
> +      assert (symname != NULL && strcmp (symname, "backtracegen") == 0);
> +      mod = dwfl_addrmodule (dwfl, pc);
> +      if (mod)
> +	symname2 = dwfl_module_addrname (mod, pc);
> +      assert (symname2 == NULL || strcmp (symname2, "backtracegen") != 0);
> +      break;
> +  }
> +}

This really could use a comment at the top showing the expected
backtrace:

/*
  0 (null)       // .plt
  1 sigusr2      // in exe
  2 __restore_rt // or null
  3 jmp          // offset+0
  4 stdarg       // in exe
  5 backtracegen // last offset
  ...
    main         // in exe
  ...
*/

> +#ifdef __x86_64__
> +static void
> +prepare_thread (pid_t pid2, Dwarf_Addr plt_start, Dwarf_Addr plt_end,
> +		void (*jmp) (void))
> +{
> +  long l;
> +  errno = 0;
> +  l = ptrace (PTRACE_POKEUSER, pid2,
> +	      (void *) (intptr_t) offsetof (struct user_regs_struct, rip), jmp);
> +  assert_perror (errno);
> +  assert (l == 0);
> +  l = ptrace (PTRACE_CONT, pid2, NULL, (void *) (intptr_t) SIGUSR2);
> +  int status;
> +  pid_t got = waitpid (pid2, &status, __WALL);
> +  assert_perror (errno);
> +  assert (got == pid2);
> +  assert (WIFSTOPPED (status));
> +  assert (WSTOPSIG (status) == SIGUSR1);
> +  for (;;)
> +    {
> +      errno = 0;
> +      l = ptrace (PTRACE_PEEKUSER, pid2,
> +		  (void *) (intptr_t) offsetof (struct user_regs_struct, rip),
> +		  NULL);
> +      assert_perror (errno);
> +      if ((unsigned long) l >= plt_start && (unsigned long) l < plt_end)
> +	break;
> +      l = ptrace (PTRACE_SINGLESTEP, pid2, NULL, NULL);
> +      assert_perror (errno);
> +      assert (l == 0);
> +      got = waitpid (pid2, &status, __WALL);
> +      assert_perror (errno);
> +      assert (got == pid2);
> +      assert (WIFSTOPPED (status));
> +      assert (WSTOPSIG (status) == SIGTRAP);
> +    }
> +}
> +#endif /* __x86_64__ */

Urgh. Somewhat unportable :{
I see what it is doing.
Is this to get us into the .plt entry?
It could use a comment to explain to porters what the intention is.

> +#include <asm/unistd.h>
> +#include <unistd.h>
> +#define tgkill(pid, tid, sig) syscall (__NR_tgkill, (pid), (tid), (sig))
> +
> +static void
> +ptrace_detach_stopped (pid_t pid)
> +{
> +  errno = 0;
> +  long l = ptrace (PTRACE_DETACH, pid, NULL, (void *) (intptr_t) SIGSTOP);
> +  assert_perror (errno);
> +  assert (l == 0);
> +}

OK. To get the threads detached, but in a known stopped state.

> +static void
> +selfdump (const char *exec)
> +{
> +  pid_t pid = fork ();
> +  switch (pid)
> +  {
> +    case -1:
> +      abort ();
> +    case 0:
> +      execl (exec, exec, "--ptraceme", "--run", NULL);
> +      abort ();
> +    default:
> +      break;
> +  }

Currently when we run tests under valgrind we use --trace-children=yes
which will cause this to launch valgrind as main executable instead. But
don't worry. I'll fix that when I get to fixing the other valgrind issue
we found.

> +  /* Catch the main thread.  Catch it first otherwise the /proc evaluation of
> +     PID may have caught still ourselves before executing execl above.  */
> +  errno = 0;
> +  int status;
> +  pid_t got = waitpid (pid, &status, 0);
> +  assert_perror (errno);
> +  assert (got == pid);
> +  assert (WIFSTOPPED (status));
> +  assert (WSTOPSIG (status) == SIGUSR2);
> +
> +  /* Catch the spawned thread.  Do not use __WCLONE as we could get racy
> +     __WCLONE, probably despite pthread_create already had to be called the new
> +     task is not yet alive enough for waitpid.  */
> +  pid_t pid2 = waitpid (-1, &status, __WALL);
> +  assert_perror (errno);
> +  assert (pid2 > 0);
> +  assert (pid2 != pid);
> +  assert (WIFSTOPPED (status));
> +  assert (WSTOPSIG (status) == SIGUSR1);

Aha, so SIGUSR1 is the pthread, SIGUSR2 is the main thread.
That isn't immediately clear from the bracktrace_child source code.
Might want to add a comment there.

> +  Dwfl *dwfl = pid_to_dwfl (pid);
> +  char *selfpathname;
> +  int i = asprintf (&selfpathname, "/proc/%ld/exe", (long) pid);
> +  assert (i > 0);
> +  struct see_exec_module data;
> +  ssize_t ssize = readlink (selfpathname, data.selfpath,
> +			    sizeof (data.selfpath));
> +  free (selfpathname);
> +  assert (ssize > 0 && ssize < (ssize_t) sizeof (data.selfpath));
> +  data.selfpath[ssize] = '\0';
> +  data.mod = NULL;
> +  ptrdiff_t ptrdiff = dwfl_getmodules (dwfl, see_exec_module, &data, 0);
> +  assert (ptrdiff == 0);
> +  assert (data.mod != NULL);
> +  GElf_Addr loadbase;
> +  Elf *elf = dwfl_module_getelf (data.mod, &loadbase);
> +  GElf_Ehdr ehdr_mem, *ehdr = gelf_getehdr (elf, &ehdr_mem);
> +  assert (ehdr != NULL);
> +  Elf_Scn *scn = NULL, *plt = NULL;
> +  while ((scn = elf_nextscn (elf, scn)) != NULL)
> +    {
> +      GElf_Shdr scn_shdr_mem, *scn_shdr = gelf_getshdr (scn, &scn_shdr_mem);
> +      assert (scn_shdr != NULL);
> +      if (strcmp (elf_strptr (elf, ehdr->e_shstrndx, scn_shdr->sh_name),
> +		  ".plt") != 0)
> +	continue;
> +      assert (plt == NULL);
> +      plt = scn;
> +    }
> +  assert (plt != NULL);
> +  GElf_Shdr scn_shdr_mem, *scn_shdr = gelf_getshdr (plt, &scn_shdr_mem);
> +  assert (scn_shdr != NULL);
> +  /* Make it true on x86_64 with i386 inferior.  */
> +  int disable = ehdr->e_ident[EI_CLASS] == ELFCLASS32;

This could use an explanation of "it". This doesn't give a clue of what
is being disabled.

> +#ifdef __x86_64__
> +  Dwarf_Addr plt_start = scn_shdr->sh_addr + loadbase;
> +  Dwarf_Addr plt_end = plt_start + scn_shdr->sh_size;
> +  void (*jmp) (void);
> +  if (! disable)
> +    {
> +      int nsym = dwfl_module_getsymtab (data.mod);
> +      int symi;
> +      for (symi = 1; symi < nsym; ++symi)
> +	{
> +	  GElf_Sym symbol;
> +	  const char *symbol_name = dwfl_module_getsym (data.mod, symi, &symbol, NULL);
> +	  if (symbol_name == NULL)
> +	    continue;
> +	  switch (GELF_ST_TYPE (symbol.st_info))
> +	    {
> +	    case STT_SECTION:
> +	    case STT_FILE:
> +	    case STT_TLS:
> +	      continue;
> +	    default:
> +	      if (strcmp (symbol_name, "jmp") != 0)
> +		continue;
> +	      break;
> +	    }
> +	  /* LOADBASE is already applied here.  */
> +	  jmp = (void (*) (void)) (uintptr_t) symbol.st_value;
> +	  break;
> +	}
> +      assert (symi < nsym);
> +    prepare_thread (pid2, plt_start, plt_end, jmp);
> +    }
> +#endif

Like above, this could use a comment for porters to explain what address
is looked up and why. I wonder if this could be done a little more
portable by having the child just print the needed address and the
parent reading that. Maybe even use a pipe between the two for
synchronization.

> +  dwfl_end (dwfl);
> +  ptrace_detach_stopped (pid);
> +  ptrace_detach_stopped (pid2);
> +  dump (pid, NULL, selfdump_callback,
> +	(void *) (intptr_t) (disable ? -pid2 : pid2));
> +}



> +static bool
> +is_core (const char *corefile)
> +{
> +  Dwfl *dwfl = dwfl_offline ();
> +  Dwfl_Module *mod = dwfl_report_elf (dwfl, "core", corefile, -1, 0 /* base */,
> +				      false /* add_p_vaddr */);
> +  assert (mod != NULL);
> +  GElf_Addr loadbase_ignore;
> +  Elf *core = dwfl_module_getelf (mod, &loadbase_ignore);
> +  assert (core != NULL);
> +  GElf_Ehdr ehdr_mem, *ehdr = gelf_getehdr (core, &ehdr_mem);
> +  assert (ehdr != NULL);
> +  assert (ehdr->e_type == ET_CORE || ehdr->e_type == ET_EXEC
> +	  || ehdr->e_type == ET_DYN);
> +  bool retval = ehdr->e_type == ET_CORE;
> +  dwfl_end (dwfl);
> +  return retval;
> +}

This works and tests dwfl_report_elf too, so it is fine. But you can
also just use elf_begin () to open the core file and check the ehdr.

> +int
> +main (int argc __attribute__ ((unused)), char **argv)
> +{
> +  /* We use no threads here which can interfere with handling a stream.  */
> +  __fsetlocking (stdin, FSETLOCKING_BYCALLER);
> +  __fsetlocking (stdout, FSETLOCKING_BYCALLER);
> +  __fsetlocking (stderr, FSETLOCKING_BYCALLER);
> +
> +  /* Set locale.  */
> +  (void) setlocale (LC_ALL, "");
> +
> +  if (argc == 1)
> +    {
> +      selfdump ("./backtrace-child");
> +      return 0;
> +    }
> +  argv++;
> +  if (argc == 2)
> +    {
> +      if (strcmp (*argv, "--help") == 0)
> +	error (2, 0, "backtrace {{no args for ./backtrace-child}|<pid>|<core>|"
> +		     "<executable>|<executable> <core>}");
> +      char *end;
> +      long l = strtol (*argv, &end, 10);
> +      if (**argv && !*end)
> +	dump (l, NULL, NULL, NULL);
> +      else if (is_core (*argv))
> +	dump (0, *argv, NULL, NULL);
> +      else
> +	selfdump (*argv);
> +      return 0;
> +    }
> +  if (argc == 3)
> +    {
> +      assert (! is_core (argv[0]));
> +      assert (is_core (argv[1]));
> +      executable = argv[0];
> +      dump (0, argv[1], NULL, NULL);
> +      return 0;
> +    }
> +  assert (0);
> +
> +  return 0;
> +}

This really could use some documentation or better argument parsing IMHO.
Why not have simpler arguments:

--backtrace-child
--backtrace-pid <pid>
--backtrace-core <corefile>
--backtrace-exe <exefile>
--backtrace-exe-core <exefile> <corefile>

But if I read run-backtrace.sh correctly you only need the last two. It
might make the testcase a little more readable/easier to follow to just
drop the other cases.


> +++ b/tests/run-backtrace.sh
> @@ -0,0 +1,94 @@
> +#! /bin/bash
> +# Copyright (C) 2013 Red Hat, Inc.
> +# This file is part of elfutils.
> +#
> +# This file is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# elfutils is distributed in the hope that it will be useful, but
> +# WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +. $srcdir/test-subr.sh
> +
> +# Verify one of the backtraced threads contains function 'main'.
> +check_main()
> +{
> +  if grep -w main $1; then
> +    return
> +  fi
> +  echo >&2 $2: no main
> +  false
> +}
> +
> +# Without proper ELF symbols resolution we could get inappropriate weak
> +# symbol "gsignal" with the same address as the correct symbol "raise".
> +# It was fixed by GIT commit 78dec228b3cfb2f9300cd0b682ebf416c9674c91 .
> +# [patch] Improve ELF symbols preference (global > weak)
> +# https://lists.fedorahosted.org/pipermail/elfutils-devel/2012-October/002624.html
> +check_gsignal()
> +{
> +  if ! grep -w gsignal $1; then
> +    return
> +  fi
> +  cat >&2 $1
> +  echo >&2 $2: found gsignal
> +  false
> +}
> +
> +# Verify the STDERR output does not contain unexpected errors.
> +# In some cases we cannot reliably find out we got behind _start as some
> +# operating system do not properly terminate CFI by undefined PC.
> +# Ignore it here as it is a bug of OS, not a bug of elfutils.
> +check_err()
> +{
> +  if test ! -s $1; then
> +    return
> +  fi
> +  if cmp -s <(echo "${abs_builddir}/backtrace: dwfl_thread_getframes: no matching address range") <(uniq <$1); then
> +    return
> +  fi
> +  cat >&2 $1
> +  echo >&2 $2: neither empty nor just out of DWARF
> +  false
> +}
> +
> +check_all()
> +{
> +  bt=$1
> +  err=$2
> +  testname=$3
> +  cat $bt $err
> +  check_main $bt $testname
> +  check_gsignal $bt $testname
> +  check_err $err $testname
> +}
> +
> +# Test both 64-bit and 32-bit tracees.  Both files will be the same if the
> +# current platform does not support different target archs.
> +for child in backtrace-child{,-biarch}; do
> +
> +  # Backtrace live process.
> +  # Do not abort on non-zero exit code due to some warnings of ./backtrace
> +  # - see function check_err.
> +  tempfiles $child.{bt,err}
> +  (set +ex; testrun ${abs_builddir}/backtrace ${abs_builddir}/$child 1>$child.bt 2>$child.err; true)
> +  check_all $child.{bt,err} $child
> +
> +  # Backtrace core file.
> +  core="core.`ulimit -c unlimited; set +ex; testrun ${abs_builddir}/$child --gencore --run; true`"

Why the true? Don't you want to fail early if the core file cannot be generated?

> +  # Do not abort on non-zero exit code due to some warnings of ./backtrace
> +  # - see function check_err.
> +  tempfiles $core{,.{bt,err}}
> +  (set +ex; testrun ${abs_builddir}/backtrace ${abs_builddir}/$child $core 1>$core.bt 2>$core.err; true)
> +  check_all $core.{bt,err} $child-$core

Generating real core files is good to test the whole system. But it
would be good to also add a pregenerated exe and core file to the
testsuite and run a test on that. Then you aren't relying on whether the
system toolchain/kernel is perfect and it can more easily be tested
cross architecture.

It would be nice to split this in three separate tests (that use a
common backtrace-subr.sh):

- run-backtrace-core Which would run on pregenerated executables and
  core files from a known good system for which the backend support is
  there to run a backtrace on a core. This test would be cross arch.

- run-backtrace-native Which would do the live process backtracing.

- run-backtrace-native-core Which would do the backtrace on a just
  natively generated core file as sanity check that the local
  toolchain/kernel is sane.

That would also help with porting new backends since it would show
different parts of the backend (or local toolchain) that need to have
support for success.

Thanks,

Mark
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]