This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] PR threads/11692 PR gdb/12203, handle_inferior_event assertion when pthread_create fails


These PRs are about an assertion in handle_inferior_event triggering:

  if (ecs->ws.kind != TARGET_WAITKIND_EXITED
      && ecs->ws.kind != TARGET_WAITKIND_SIGNALLED
      && ecs->ws.kind != TARGET_WAITKIND_NO_RESUMED)
    {
      struct inferior *inf = find_inferior_pid (ptid_get_pid (ecs->ptid));

>>>   gdb_assert (inf);   <<<
      stop_soon = inf->control.stop_soon;
    }
  else
    stop_soon = NO_STOP_QUIETLY;


The problem is that linux-thread-db.c is returning a minus_one_ptid to
infrun when here:

WL: waitpid Thread 0xb7fe16c0 (LWP 22258) received Stopped (signal) (stopped)
LLW: trap ptid is LWP 22262.
infrun: target_wait (-1, status) =
infrun:   -1 [process -1],
infrun:   status->kind = spurious

  if (have_threads (ptid))
    {
      /* Change ptids back into the higher level PID + TID format.  If
	 the thread is dead and no longer on the thread list, we will
	 get back a dead ptid.  This can occur if the thread death
	 event gets postponed by other simultaneous events.  In such a
	 case, we want to just ignore the event and continue on.  */

      ptid = thread_from_lwp (ptid);
      if (GET_PID (ptid) == -1)
	ourstatus->kind = TARGET_WAITKIND_SPURIOUS;
    }
  ...
  return ptid;

And inferior '-1' will obviously not be found in the inferior table,
leading to the assertion.

To be fair, this -1/spurious return was already in place before the
assertion was added.  But the reaction to TARGET_WAITKIND_SPURIOUS
must be to re-resume the thread/inferior doing nothing else (such is
the way the event is defined).  But if the event is reported to
minus_one_ptid, then _which_ thread/inferior should be resumed?  So
this is a problem for non-stop/multi-process and the fix can't just be
to remove the assertion.

That comment in linux-thread-db.c was originally added in:

http://sourceware.org/ml/gdb-patches/2004-06/msg00073.html

along with this other comment:

  /* Fetch the thread info.  If we get back TD_THR_ZOMBIE, then the
     event thread has already died.  If another gdb interface has called
     thread_alive() prior, the thread won't be found on the thread list
     anymore.  In that case, we don't want to process this ptid anymore
     to avoid the possibility of later treating it as a newly
     discovered thread id that we should add to the list.  Thus,
     we return a -1 ptid which is also how the thread list marks a
     dead thread.  */
  if (thread_get_info_callback (&th, &thread_info) == TD_THR_ZOMBIE
      && thread_info == NULL)
    return pid_to_ptid (-1);


It took me a while to make sense of these comments.  The trick is to
realize that at the time of that patch (see 3915e6ee in git), we had a
thread_db_thread_alive function in the thread_db target, which would
return false to TD_THR_ZOMBIE threads.  Nowadays, the
target_thread_alive callback always ends up in
linux-nat.c:linux_nat_thread_alive even if the thread_db target is
pushed, and that function only considers the liveness of the LWP, so
the "thread_alive() prior" case described no longer happens.

Over the years, Dan make linux-thread-db.c more independent of
user-level thread ids, and removed the thread_db_thread_alive function
here:

http://sourceware.org/ml/gdb-patches/2006-07/msg00248.html

So we no longer have the need for that TARGET_WAITKIND_SPURIOUS
special case.

In the case of the PRs (and the new test added by this patch), we find
a TD_THR_ZOMBIE thread for a completely different reason.  Those call
pthread_create and that call fails.  What happens is that the
pthread_create implementation (and friends) can't check if the
requested affinity is valid before actually spawning the new thread.
So it spawns the new clone in suspended state, and tries then to set
its affinity.  That fails, and so pthread_create should not result in
a new thread being created, and so needs to kill the new clone, by
sending it a SIGCANCEL, and returning error.  This signal's handler
triggers the __pthread_cancel path in the new clone.  Now, in this
path, the new clone dlopens libgcc_s.so, and if that happens to be the
first time that SO is being dlopen'ed in the process, the solib event
breakpoint is hit to report the load to the debugger.  That breakpoint
hit ends up being reported to thread_db_wait.  Since this new clone
thread is the carcass of an aborted thread that was never really born,
when thread_db_wait goes ask nptl_db about the event thread's info, it
gets back TD_THR_ZOMBIE.  And so we reach that code that returns
TARGET_WAITKIND_SPURIOUS/minus_one_ptid to infrun.  Note also that we
also lose TARGET_WAITKIND_TRAP, and therefore the solib event when
this happens.

The fix is to remove all this now unnecessary TD_THR_ZOMBIE special
casing.  This makes the TARGET_WAITKIND_SPURIOUS+minus_one_ptid
support in handle_inferior_event (it was added at the same time)
unnecessary too, so it is removed as well.

The condition that triggers the assertion doesn't always trigger, so
the test loops a few times.  10 iterations are enough to reliably
catch it at least a couple times per run on my machine, but usually
catches more.

Note the test avoids another issue:

	  /* Wait for all threads to exit.  pthread_create spawns a
	     clone thread even in the failing case, as it can only try
	     to set the affinity after creating the thread.  That new
	     thread is immediately canceled (because setting the
	     affinity fails), by killing it with a SIGCANCEL signal,
	     which may ends up in pthread_cancel/unwind paths, which
	     may trigger a libgcc_s.so load, making the thread hit the
	     solib-event breakpoint.  Now, if we would let the program
	     exit without waiting, sometimes it would happen that the
	     inferior exists just while we're handling the
	     solib-event, resulting in errors being thrown due to
	     failing ptrace call fails (with ESCHR), breaking the
	     test.  */

(It was this avoidance that would fail/hang sometimes due to the
signals respawning issue recently fixed.)

Note that GDBserver doesn't have this issue.

Tested on x86_64 Fedora 17.  No regressions.

Any comments?

2012-07-05  Pedro Alves  <palves@redhat.com>

	PR threads/11692 PR gdb/12203

	gdb/
	* infrun.c (handle_inferior_event) <new thread>: Don't special
	case minus_one_ptid.
	<TARGET_WAITKIND_SPURIOUS>: Ditto.
	* linux-thread-db.c (thread_get_info_callback): Don't return early
	if the thread is zombie.
	(thread_from_lwp): Change return type to void.  Rewrite stale
	comment.
	(attach_thread): Don't return early if the thread is zombie,
	instead set its "dying" flag.
	(thread_db_wait): Don't return TARGET_WAITKIND_SPURIOUS anymore.
	(find_new_threads_callback): Don't return early if the thread is
	zombie.

	gdb/testsuite/
	* gdb.threads/create-fail.c: New file.
	* gdb.threads/create-fail.exp: New file.
---
 gdb/testsuite/gdb.threads/create-fail.c   |  121 +++++++++++++++++++++++++++++
 gdb/testsuite/gdb.threads/create-fail.exp |   53 +++++++++++++
 2 files changed, 174 insertions(+)
 create mode 100644 gdb/testsuite/gdb.threads/create-fail.c
 create mode 100644 gdb/testsuite/gdb.threads/create-fail.exp

diff --git a/gdb/infrun.c b/gdb/infrun.c
index 11f981f..622bb34 100644
--- a/gdb/infrun.c
+++ b/gdb/infrun.c
@@ -3194,8 +3194,7 @@ handle_inferior_event (struct execution_control_state *ecs)
     }
 
   if (ecs->ws.kind != TARGET_WAITKIND_EXITED
-      && ecs->ws.kind != TARGET_WAITKIND_SIGNALLED
-      && !ptid_equal (ecs->ptid, minus_one_ptid))
+      && ecs->ws.kind != TARGET_WAITKIND_SIGNALLED)
     {
       ecs->event_thread = find_thread_ptid (ecs->ptid);
       /* If it's a new thread, add it to the thread database.  */
@@ -3363,8 +3362,7 @@ handle_inferior_event (struct execution_control_state *ecs)
     case TARGET_WAITKIND_SPURIOUS:
       if (debug_infrun)
         fprintf_unfiltered (gdb_stdlog, "infrun: TARGET_WAITKIND_SPURIOUS\n");
-      if (!ptid_equal (ecs->ptid, inferior_ptid)
-	  && !ptid_equal (ecs->ptid, minus_one_ptid))
+      if (!ptid_equal (ecs->ptid, inferior_ptid))
 	context_switch (ecs->ptid);
       resume (0, GDB_SIGNAL_0);
       prepare_to_wait (ecs);
diff --git a/gdb/linux-thread-db.c b/gdb/linux-thread-db.c
index 7f8f83e..07bd52b 100644
--- a/gdb/linux-thread-db.c
+++ b/gdb/linux-thread-db.c
@@ -422,11 +422,6 @@ thread_get_info_callback (const td_thrhandle_t *thp, void *argp)
   thread_ptid = ptid_build (info->pid, ti.ti_lid, 0);
   inout->thread_info = find_thread_ptid (thread_ptid);
 
-  /* In the case of a zombie thread, don't continue.  We don't want to
-     attach to it thinking it is a new thread.  */
-  if (ti.ti_state == TD_THR_UNKNOWN || ti.ti_state == TD_THR_ZOMBIE)
-    return TD_THR_ZOMBIE;
-
   if (inout->thread_info == NULL)
     {
       /* New thread.  Attach to it now (why wait?).  */
@@ -441,9 +436,9 @@ thread_get_info_callback (const td_thrhandle_t *thp, void *argp)
   return 0;
 }
 
-/* Convert between user-level thread ids and LWP ids.  */
+/* Fetch the user-level thread id of PTID.  */
 
-static ptid_t
+static void
 thread_from_lwp (ptid_t ptid)
 {
   td_thrhandle_t th;
@@ -467,22 +462,10 @@ thread_from_lwp (ptid_t ptid)
     error (_("Cannot find user-level thread for LWP %ld: %s"),
 	   GET_LWP (ptid), thread_db_err_str (err));
 
-  /* Fetch the thread info.  If we get back TD_THR_ZOMBIE, then the
-     event thread has already died.  If another gdb interface has called
-     thread_alive() previously, the thread won't be found on the thread list
-     anymore.  In that case, we don't want to process this ptid anymore
-     to avoid the possibility of later treating it as a newly
-     discovered thread id that we should add to the list.  Thus,
-     we return a -1 ptid which is also how the thread list marks a
-     dead thread.  */
+  /* Long-winded way of fetching the thread info.  */
   io.thread_db_info = info;
   io.thread_info = NULL;
-  if (thread_get_info_callback (&th, &io) == TD_THR_ZOMBIE
-      && io.thread_info == NULL)
-    return minus_one_ptid;
-
-  gdb_assert (ptid_get_tid (ptid) == 0);
-  return ptid;
+  thread_get_info_callback (&th, &io);
 }
 
 
@@ -1259,9 +1242,6 @@ attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
   if (target_has_execution)
     check_thread_signals ();
 
-  if (ti_p->ti_state == TD_THR_UNKNOWN || ti_p->ti_state == TD_THR_ZOMBIE)
-    return 0;			/* A zombie thread -- do not attach.  */
-
   /* Under GNU/Linux, we have to attach to each and every thread.  */
   if (target_has_execution
       && tp == NULL)
@@ -1296,6 +1276,8 @@ attach_thread (ptid_t ptid, const td_thrhandle_t *th_p,
   gdb_assert (ti_p->ti_tid != 0);
   private->th = *th_p;
   private->tid = ti_p->ti_tid;
+  if (ti_p->ti_state == TD_THR_UNKNOWN || ti_p->ti_state == TD_THR_ZOMBIE)
+    private->dying = 1;
 
   /* Add the thread to GDB's thread list.  */
   if (tp == NULL)
@@ -1513,15 +1495,8 @@ thread_db_wait (struct target_ops *ops,
 
   if (have_threads (ptid))
     {
-      /* Change ptids back into the higher level PID + TID format.  If
-	 the thread is dead and no longer on the thread list, we will
-	 get back a dead ptid.  This can occur if the thread death
-	 event gets postponed by other simultaneous events.  In such a
-	 case, we want to just ignore the event and continue on.  */
-
-      ptid = thread_from_lwp (ptid);
-      if (GET_PID (ptid) == -1)
-	ourstatus->kind = TARGET_WAITKIND_SPURIOUS;
+      /* Fill in the thread's user-level thread id.  */
+      thread_from_lwp (ptid);
     }
 
   return ptid;
@@ -1566,9 +1541,6 @@ find_new_threads_callback (const td_thrhandle_t *th_p, void *data)
     error (_("find_new_threads_callback: cannot get thread info: %s"),
 	   thread_db_err_str (err));
 
-  if (ti.ti_state == TD_THR_UNKNOWN || ti.ti_state == TD_THR_ZOMBIE)
-    return 0;			/* A zombie -- ignore.  */
-
   if (ti.ti_tid == 0)
     {
       /* A thread ID of zero means that this is the main thread, but
diff --git a/gdb/testsuite/gdb.threads/create-fail.c b/gdb/testsuite/gdb.threads/create-fail.c
new file mode 100644
index 0000000..738ba7f
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/create-fail.c
@@ -0,0 +1,121 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2012 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#define _GNU_SOURCE
+#include <pthread.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sched.h>
+#include <errno.h>
+#include <string.h>
+#include <assert.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <dirent.h>
+#include <assert.h>
+
+/* Count the number of tasks/threads in the PID thread group.  */
+
+static int
+count_tasks (pid_t pid)
+{
+  char path[100];
+  int count;
+  DIR *d;
+
+  snprintf (path, sizeof (path), "/proc/%d/task/", (int) pid);
+  d = opendir (path);
+  if (d == NULL)
+    return -1;
+
+  for (count = 0; readdir (d) != NULL; count++)
+    ;
+  closedir (d);
+
+  /* Account for '.' and '..'.  */
+  assert (count > 2);
+  return count - 2;
+}
+
+pthread_attr_t attr[CPU_SETSIZE];
+pthread_t thr[CPU_SETSIZE];
+
+static void *
+mythread (void *_arg)
+{
+  return NULL;
+}
+
+int
+main ()
+{
+  int i;
+
+  for (i = 0; i < CPU_SETSIZE; i++)
+    {
+      cpu_set_t set;
+      int ret;
+
+      pthread_attr_init (&attr[i]);
+      CPU_ZERO_S (sizeof (set), &set);
+      CPU_SET_S (i, sizeof (set), &set);
+
+      ret = pthread_attr_setaffinity_np (&attr[i], sizeof (set), &set);
+      if (ret != 0)
+	{
+	  fprintf (stderr, "set_affinity: %d: %s\n", ret, strerror (ret));
+	  exit (3);
+	}
+      ret = pthread_create (&thr[i], &attr[i], mythread, NULL);
+      /* Should fail with EINVAL at some point.  */
+      if (ret != 0)
+	{
+	  unsigned long t;
+
+	  fprintf (stderr, "pthread_create: %d: %s\n", ret, strerror (ret));
+
+	  /* Wait for all threads to exit.  pthread_create spawns a
+	     clone thread even in the failing case, as it can only try
+	     to set the affinity after creating the thread.  That new
+	     thread is immediately canceled (because setting the
+	     affinity fails), by killing it with a SIGCANCEL signal,
+	     which may ends up in pthread_cancel/unwind paths, which
+	     may trigger a libgcc_s.so load, making the thread hit the
+	     solib-event breakpoint.  Now, if we would let the program
+	     exit without waiting, sometimes it would happen that the
+	     inferior exists just while we're handling the
+	     solib-event, resulting in errors being thrown due to
+	     failing ptrace call fails (with ESCHR), breaking the
+	     test.  */
+	  t = 16;
+	  while (count_tasks (getpid ()) > 1)
+	    {
+	      usleep (t);
+
+	      if (t < 256)
+		t *= 2;
+	    }
+
+	  /* Normal exit, because this is what we are expecting.  */
+	  exit (0);
+	}
+    }
+
+  /* Should not normally be reached.  */
+
+  return 1;
+}
diff --git a/gdb/testsuite/gdb.threads/create-fail.exp b/gdb/testsuite/gdb.threads/create-fail.exp
new file mode 100644
index 0000000..0cdacf6
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/create-fail.exp
@@ -0,0 +1,53 @@
+# Copyright (C) 2012 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# On GNU/Linux, a creating a thread bound to an unexisting cpu spawns
+# the clone child thread for a bit, which is then immediately
+# cancelled.  The spawned child may trigger a dlopen (for libgcc_s)
+# while being cancelled, which results in a trap being reported to
+# GDB, for a thread that libthread_db considers to be TD_THR_ZOMBIE.
+# Make sure we handle that scenario properly.
+
+standard_testfile
+set executable ${testfile}
+
+if {[gdb_compile_pthreads "${srcdir}/${subdir}/${srcfile}" "${binfile}" executable {debug}] != "" } {
+    return -1
+}
+
+set iterations 10
+for {set i 1} {$i <= $iterations} {incr i} {
+    with_test_prefix "iteration $i" {
+
+	clean_restart ${executable}
+
+	if ![runto_main] {
+	    return -1
+	}
+
+	set test "run till end"
+	gdb_test_multiple "continue" "$test" {
+	    -re "exited with code 01.*$gdb_prompt $" {
+		pass "$test"
+	    }
+	    -re "exited with code 02.*$gdb_prompt $" {
+		unsupported "$test (too many CPUs for test?)"
+	    }
+	    -re "exited normally.*$gdb_prompt $" {
+		pass "$test"
+	    }
+	}
+    }
+}


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]