This is the mail archive of the gdb-patches@sourceware.cygnus.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

RFA: linux-thread.c change (fixes hang on startup)


Recently, on Linux, there have been a number of instances where I've
had gdb hang on me when attempting to run the inferior.  This behavior
was not always reproducible.  A short while ago, I was fortunate
enough to run into a situation where the hang was consistently
occurring.  (Kind of maddening though because I was hunting for a bug
in another program.) The point is though, that it was occurring
consistently enough for me to debug and analyze the problem.

When I would attach to the hung gdb process, I'd see the following
backtrace:

#0  0x401311bb in __sigsuspend (set=0x82dd500)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:48
#1  0x80b1923 in linux_child_wait (pid=-1, rpid=0xbfffed34, status=0xbfffed38)
    at /ocotillo1/kev/devo/gdb/linux-thread.c:1339
#2  0x80b1aa7 in linuxthreads_wait (pid=-1, ourstatus=0xbfffed54)
    at /ocotillo1/kev/devo/gdb/linux-thread.c:1416
#3  0x80a0435 in wait_for_inferior () at /ocotillo1/kev/devo/gdb/infrun.c:1260
#4  0x80ead5f in startup_inferior (ntraps=2)
    at /ocotillo1/kev/devo/gdb/fork-child.c:544
#5  0x80ea498 in ptrace_him (pid=9393) at /ocotillo1/kev/devo/gdb/inftarg.c:492
#6  0x80eabca in fork_inferior (
    exec_file=0x8318e88 "/home/kev/netstuff/xvile-bld.ocotillo/./xvile2", 
    allargs=0x830e068 "", env=0x830e088, traceme_fun=0x80ea464 <ptrace_me>, 
    init_trace_fun=0x80ea478 <ptrace_him>, pre_trace_fun=0, 
    shell_file=0xbffffed3 "/bin/bash")
    at /ocotillo1/kev/devo/gdb/fork-child.c:365
#7  0x80ea4c7 in child_create_inferior (
    exec_file=0x8318e88 "/home/kev/netstuff/xvile-bld.ocotillo/./xvile2", 
    allargs=0x830e068 "", env=0x830e088)
    at /ocotillo1/kev/devo/gdb/inftarg.c:514
#8  0x80b1f86 in linuxthreads_create_inferior (
    exec_file=0x8318e88 "/home/kev/netstuff/xvile-bld.ocotillo/./xvile2", 
    allargs=0x830e068 "", env=0x830e088)
    at /ocotillo1/kev/devo/gdb/linux-thread.c:1610
#9  0x80c0e69 in find_default_create_inferior (
    exec_file=0x8318e88 "/home/kev/netstuff/xvile-bld.ocotillo/./xvile2", 
    allargs=0x830e068 "", env=0x830e088)
    at /ocotillo1/kev/devo/gdb/target.c:1250
#10 0x809ded3 in run_command (args=0x0, from_tty=1)
    at /ocotillo1/kev/devo/gdb/infcmd.c:349
[...]

In linux_child_wait(), we have the following loop:

  errno = save_errno = 0;
  for (;;)
    {
      errno = 0;
      *rpid = waitpid (pid, status, __WCLONE | WNOHANG);
      save_errno = errno;

      if (*rpid > 0)
	{
	  /* Got an event -- break out */
	  break;
	}
      if (errno == EINTR)	/* interrupted by signal, try again */
	{
	  continue;
	}

      errno = 0;
      *rpid = waitpid (pid, status, WNOHANG);
      if (*rpid > 0)
	{
	  /* Got an event -- break out */
	  break;
	}
      if (errno == EINTR)
	{
	  continue;
	}
      if (errno != 0 && save_errno != 0)
	{
	  break;
	}
      sigsuspend(&linuxthreads_block_mask);
    }

Basically, this loop is attempting to detect some change in the
child's status and it'll exit when that change occurs.  I was able to
examine errno, save_errno, and *rpid and learned that errno was 0 (no
error), save_errno was 10 (ECHILD - no child processes), and that
*rpid was 0.

The fact that *rpid was 0 means that waitpid() returned immediately
(due to the WNOHANG flag) because it had no child to report on.

I also examined linuxthreads_block_mask.  Here's what it looked like:

(gdb) print linuxthreads_block_mask
$8 = {__val = {65536, 0 <repeats 31 times>}}

After refreshing my memory on how signal masks were constructed
and what they mean, I learned that the above mask is causing
SIGCHLD to get blocked.  This is definitely not what we want!
We need to wake up upon receipt of a SIGCHLD, so we can attempt
the waitpid() calls again.

I went looking for the code for which initializes
linuxthreads_block_mask and noted that it looked like it had been
taken almost verbatim from one of W. Richard Stevens' books.  (This
gave me pause, because I have a very high regard for Stevens' books.)

The code in question looks like this:

  /* initialize SIGCHLD mask */
  sigemptyset (&linuxthreads_wait_mask);
  sigaddset (&linuxthreads_wait_mask, SIGCHLD);

  /* Use SIG_BLOCK to block receipt of SIGCHLD.
     The block_mask will allow us to wait for this signal explicitly.  */
  sigprocmask(SIG_BLOCK, 
	      &linuxthreads_wait_mask, 
	      &linuxthreads_block_mask);

What this code fails to take into account is what happens if SIGCHLD is
already being blocked.  (Perhaps set that way from some other part of
gdb.)  If this is the case, then setting linuxthreads_block_mask to
the old signal mask (prior to blocking the SIGCHLD with SIG_BLOCK)
results in a mask with SIGCHLD blocked.

This is clearly a problem because the loop noted above will get stuck
in sigsuspend() if it doesn't manage to find a change in a child status
on the first iteration.

This leads me to the patch below and I hereby request approval for
committing it.  I have run the test suite on Linux and have observed
no regressions as a result of this change.

	* linux-threads.c (_initialize_linuxthreads): Make sure that
	linuxthreads_block_mask does not block SIGCHLD.

Index: linux-thread.c
===================================================================
RCS file: /cvs/cvsfiles/devo/gdb/linux-thread.c,v
retrieving revision 1.4
diff -u -r1.4 linux-thread.c
--- linux-thread.c	1999/12/16 22:57:47	1.4
+++ linux-thread.c	2000/01/22 09:24:32
@@ -1800,4 +1800,6 @@
   sigprocmask(SIG_BLOCK, 
 	      &linuxthreads_wait_mask, 
 	      &linuxthreads_block_mask);
+  /* Make sure that linuxthreads_block_mask is not blocking SIGCHLD */
+  sigdelset (&linuxthreads_block_mask, SIGCHLD);
 }


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]