This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [rfa/linux] Make SIGINT handling more robust

From: Daniel Jacobowitz <drow at false dot org>
To: gdb-patches at sourceware dot org
Date: Sun, 27 Jul 2008 17:13:18 -0400
Subject: Re: [rfa/linux] Make SIGINT handling more robust
References: <20050404042131.GA8509@nevyn.them.org> <20050404041520.GA7722@nevyn.them.org>
Two more patches from April 2005.  Updated, retested on mips-linux,
and checked in (together, since one is a test for the other).

This simplifies Control-C handling for Linux native.  It should not
have much effect on NPTL targets, but my home mips-linux test system
still uses LinuxThreads.

On Mon, Apr 04, 2005 at 12:15:20AM -0400, Daniel Jacobowitz wrote:
> This patch, which I've had kicking around for a year or so, makes SIGINT
> handling much more reliable.
> 
> The problem being solved by this code (both the current version, and the
> replacement below) is this: in a LinuxThreads program, every thread is a
> process.  When a signal is sent to the process group, every thread will
> receive it.  If the user hits C-c at the console, the signal will get sent
> to every thread; the user will be presented with an indeterminate number of
> SIGINT reports instead of one.  This could apply to other signals, too, but
> SIGINT was the only one handled by the old code.
> 
> Previously, we had flush_callback.  This function resumed the inferior when
> it saw a pending SIGINT, in order to force the SIGINT to be delivered.  This
> method is unreliable; it had the potential to lose other signals, and often
> generated a failed assertion for "lp->status == 0".
> 
> This patch relies completely on recording the pending signal, and discarding
> it once it is delivered.  An unexpected SIGINT will cause a flag to be set
> on every thread.  Then, as long as /proc indicates that the signal is
> pending, we will leave the flag set.  If a SIGINT is received while the flag
> is set, it is discarded.  At various points we check to see if the signal
> is still pending; if it isn't, then we assume it was delivered to some other
> thread (the POSIX and NPTL semantics) and clear the flag.
> 
> Tested on i686-pc-linux-gnu, with both LinuxThreads and NPTL.  With
> LinuxThreads, this removes an intermittent failure in watchthreads.exp
> (improved testcase coming soon).  Also tested with my previous patch on
> mips-unknown-linux-gnu, where results for schedlock.exp go from abyssmal
> to fairly good.

On Mon, Apr 04, 2005 at 12:21:31AM -0400, Daniel Jacobowitz wrote:
> The intermittent failure of this test using LinuxThreads puzzled me for a
> while.  It looked like this:
> 
> (gdb) continue
> Continuing.
> [expect sends ^C]
> (gdb) Quit
> (gdb) FAIL:
> 
> Now where'd that prompt come from?  Turns out using after for delays isn't
> such a good idea.  We lose inferior output while sleeping.  Same thing
> for "exec sleep 1".  But if we use gdb_expect, we can see what's going on:
> a second copy of the previous SIGINT arrived after the continue command.
> So GDB was already stopped when the ^C was sent, thus the Quit message.
> 
> This patch updates the testcase so that the log shows what's going on, and
> adds an explicit test that we don't get duplicate signals.  With my previous
> patch from today, the test passes.

-- 
Daniel Jacobowitz
CodeSourcery

2008-07-27  Daniel Jacobowitz  <dan@codesourcery.com>

	* linux-nat.c (resume_callback): Add more debugging output.
	(linux_nat_has_pending_sigint): New function, based on
	linux_nat_has_pending.
	(set_ignore_sigint, maybe_clear_ignore_sigint): New functions.
	(stop_wait_callback): Remove flush_mask handling.  Honor
	ignore_sigint.  Call maybe_clear_ignore_sigint.  Pass NULL
	to recursive calls.
	(linux_nat_has_pending, flush_callback): Remove.
	(linux_nat_filter_event): Check for ignore_sigint.
	(linux_nat_wait): Remove flush_mask support and call to
	flush_callback.  Use set_ignore_sigint and maybe_clear_ignore_sigint.
	* linux-nat.h (struct lwp_info): Add ignore_sigint field.

2008-07-27  Daniel Jacobowitz  <dan@codesourcery.com>

	* gdb.threads/manythreads.exp: Use remote_expect instead of after.
	Add a test for duplicated SIGINTs.

Index: gdb-only/gdb/linux-nat.c
===================================================================
--- gdb-only.orig/gdb/linux-nat.c	2008-07-26 22:16:18.000000000 -0400
+++ gdb-only/gdb/linux-nat.c	2008-07-26 22:38:54.000000000 -0400
@@ -1603,6 +1603,12 @@ resume_callback (struct lwp_info *lp, vo
       lp->step = 0;
       memset (&lp->siginfo, 0, sizeof (lp->siginfo));
     }
+  else if (lp->stopped && debug_linux_nat)
+    fprintf_unfiltered (gdb_stdlog, "RC: Not resuming sibling %s (has pending)\n",
+			target_pid_to_str (lp->ptid));
+  else if (debug_linux_nat)
+    fprintf_unfiltered (gdb_stdlog, "RC: Not resuming sibling %s (not stopped)\n",
+			target_pid_to_str (lp->ptid));
 
   return 0;
 }
@@ -2036,14 +2042,66 @@ stop_callback (struct lwp_info *lp, void
   return 0;
 }
 
-/* Wait until LP is stopped.  If DATA is non-null it is interpreted as
-   a pointer to a set of signals to be flushed immediately.  */
+/* Return non-zero if LWP PID has a pending SIGINT.  */
 
 static int
-stop_wait_callback (struct lwp_info *lp, void *data)
+linux_nat_has_pending_sigint (int pid)
 {
-  sigset_t *flush_mask = data;
+  sigset_t pending, blocked, ignored;
+  int i;
 
+  linux_proc_pending_signals (pid, &pending, &blocked, &ignored);
+
+  if (sigismember (&pending, SIGINT)
+      && !sigismember (&ignored, SIGINT))
+    return 1;
+
+  return 0;
+}
+
+/* Set a flag in LP indicating that we should ignore its next SIGINT.  */
+
+static int
+set_ignore_sigint (struct lwp_info *lp, void *data)
+{
+  /* If a thread has a pending SIGINT, consume it; otherwise, set a
+     flag to consume the next one.  */
+  if (lp->stopped && lp->status != 0 && WIFSTOPPED (lp->status)
+      && WSTOPSIG (lp->status) == SIGINT)
+    lp->status = 0;
+  else
+    lp->ignore_sigint = 1;
+
+  return 0;
+}
+
+/* If LP does not have a SIGINT pending, then clear the ignore_sigint flag.
+   This function is called after we know the LWP has stopped; if the LWP
+   stopped before the expected SIGINT was delivered, then it will never have
+   arrived.  Also, if the signal was delivered to a shared queue and consumed
+   by a different thread, it will never be delivered to this LWP.  */
+
+static void
+maybe_clear_ignore_sigint (struct lwp_info *lp)
+{
+  if (!lp->ignore_sigint)
+    return;
+
+  if (!linux_nat_has_pending_sigint (GET_LWP (lp->ptid)))
+    {
+      if (debug_linux_nat)
+	fprintf_unfiltered (gdb_stdlog,
+			    "MCIS: Clearing bogus flag for %s\n",
+			    target_pid_to_str (lp->ptid));
+      lp->ignore_sigint = 0;
+    }
+}
+
+/* Wait until LP is stopped.  */
+
+static int
+stop_wait_callback (struct lwp_info *lp, void *data)
+{
   if (!lp->stopped)
     {
       int status;
@@ -2052,26 +2110,24 @@ stop_wait_callback (struct lwp_info *lp,
       if (status == 0)
 	return 0;
 
-      /* Ignore any signals in FLUSH_MASK.  */
-      if (flush_mask && sigismember (flush_mask, WSTOPSIG (status)))
+      if (lp->ignore_sigint && WIFSTOPPED (status)
+	  && WSTOPSIG (status) == SIGINT)
 	{
-	  if (!lp->signalled)
-	    {
-	      lp->stopped = 1;
-	      return 0;
-	    }
+	  lp->ignore_sigint = 0;
 
 	  errno = 0;
 	  ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0);
 	  if (debug_linux_nat)
 	    fprintf_unfiltered (gdb_stdlog,
-				"PTRACE_CONT %s, 0, 0 (%s)\n",
+				"PTRACE_CONT %s, 0, 0 (%s) (discarding SIGINT)\n",
 				target_pid_to_str (lp->ptid),
 				errno ? safe_strerror (errno) : "OK");
 
-	  return stop_wait_callback (lp, flush_mask);
+	  return stop_wait_callback (lp, NULL);
 	}
 
+      maybe_clear_ignore_sigint (lp);
+
       if (WSTOPSIG (status) != SIGSTOP)
 	{
 	  if (WSTOPSIG (status) == SIGTRAP)
@@ -2108,7 +2164,7 @@ stop_wait_callback (struct lwp_info *lp,
 		}
 	      /* Hold this event/waitstatus while we check to see if
 		 there are any more (we still want to get that SIGSTOP). */
-	      stop_wait_callback (lp, data);
+	      stop_wait_callback (lp, NULL);
 
 	      if (target_can_async_p ())
 		{
@@ -2169,7 +2225,7 @@ stop_wait_callback (struct lwp_info *lp,
 
 	      /* Hold this event/waitstatus while we check to see if
 	         there are any more (we still want to get that SIGSTOP). */
-	      stop_wait_callback (lp, data);
+	      stop_wait_callback (lp, NULL);
 
 	      /* If the lp->status field is still empty, use it to
 		 hold this event.  If not, then this event must be
@@ -2202,96 +2258,6 @@ stop_wait_callback (struct lwp_info *lp,
   return 0;
 }
 
-/* Check whether PID has any pending signals in FLUSH_MASK.  If so set
-   the appropriate bits in PENDING, and return 1 - otherwise return 0.  */
-
-static int
-linux_nat_has_pending (int pid, sigset_t *pending, sigset_t *flush_mask)
-{
-  sigset_t blocked, ignored;
-  int i;
-
-  linux_proc_pending_signals (pid, pending, &blocked, &ignored);
-
-  if (!flush_mask)
-    return 0;
-
-  for (i = 1; i < NSIG; i++)
-    if (sigismember (pending, i))
-      if (!sigismember (flush_mask, i)
-	  || sigismember (&blocked, i)
-	  || sigismember (&ignored, i))
-	sigdelset (pending, i);
-
-  if (sigisemptyset (pending))
-    return 0;
-
-  return 1;
-}
-
-/* DATA is interpreted as a mask of signals to flush.  If LP has
-   signals pending, and they are all in the flush mask, then arrange
-   to flush them.  LP should be stopped, as should all other threads
-   it might share a signal queue with.  */
-
-static int
-flush_callback (struct lwp_info *lp, void *data)
-{
-  sigset_t *flush_mask = data;
-  sigset_t pending, intersection, blocked, ignored;
-  int pid, status;
-
-  /* Normally, when an LWP exits, it is removed from the LWP list.  The
-     last LWP isn't removed till later, however.  So if there is only
-     one LWP on the list, make sure it's alive.  */
-  if (lwp_list == lp && lp->next == NULL)
-    if (!linux_nat_thread_alive (lp->ptid))
-      return 0;
-
-  /* Just because the LWP is stopped doesn't mean that new signals
-     can't arrive from outside, so this function must be careful of
-     race conditions.  However, because all threads are stopped, we
-     can assume that the pending mask will not shrink unless we resume
-     the LWP, and that it will then get another signal.  We can't
-     control which one, however.  */
-
-  if (lp->status)
-    {
-      if (debug_linux_nat)
-	printf_unfiltered (_("FC: LP has pending status %06x\n"), lp->status);
-      if (WIFSTOPPED (lp->status) && sigismember (flush_mask, WSTOPSIG (lp->status)))
-	lp->status = 0;
-    }
-
-  /* While there is a pending signal we would like to flush, continue
-     the inferior and collect another signal.  But if there's already
-     a saved status that we don't want to flush, we can't resume the
-     inferior - if it stopped for some other reason we wouldn't have
-     anywhere to save the new status.  In that case, we must leave the
-     signal unflushed (and possibly generate an extra SIGINT stop).
-     That's much less bad than losing a signal.  */
-  while (lp->status == 0
-	 && linux_nat_has_pending (GET_LWP (lp->ptid), &pending, flush_mask))
-    {
-      int ret;
-      
-      errno = 0;
-      ret = ptrace (PTRACE_CONT, GET_LWP (lp->ptid), 0, 0);
-      if (debug_linux_nat)
-	fprintf_unfiltered (gdb_stderr,
-			    "FC: Sent PTRACE_CONT, ret %d %d\n", ret, errno);
-
-      lp->stopped = 0;
-      stop_wait_callback (lp, flush_mask);
-      if (debug_linux_nat)
-	fprintf_unfiltered (gdb_stderr,
-			    "FC: Wait finished; saved status is %d\n",
-			    lp->status);
-    }
-
-  return 0;
-}
-
 /* Return non-zero if LP has a wait status pending.  */
 
 static int
@@ -2657,6 +2623,36 @@ linux_nat_filter_event (int lwpid, int s
       return NULL;
     }
 
+  /* Make sure we don't report a SIGINT that we have already displayed
+     for another thread.  */
+  if (lp->ignore_sigint
+      && WIFSTOPPED (status) && WSTOPSIG (status) == SIGINT)
+    {
+      if (debug_linux_nat)
+	fprintf_unfiltered (gdb_stdlog,
+			    "LLW: Delayed SIGINT caught for %s.\n",
+			    target_pid_to_str (lp->ptid));
+
+      /* This is a delayed SIGINT.  */
+      lp->ignore_sigint = 0;
+
+      registers_changed ();
+      linux_ops->to_resume (pid_to_ptid (GET_LWP (lp->ptid)),
+			    lp->step, TARGET_SIGNAL_0);
+      if (debug_linux_nat)
+	fprintf_unfiltered (gdb_stdlog,
+			    "LLW: %s %s, 0, 0 (discard SIGINT)\n",
+			    lp->step ?
+			    "PTRACE_SINGLESTEP" : "PTRACE_CONT",
+			    target_pid_to_str (lp->ptid));
+
+      lp->stopped = 0;
+      gdb_assert (lp->resumed);
+
+      /* Discard the event.  */
+      return NULL;
+    }
+
   /* An interesting event.  */
   gdb_assert (lp);
   return lp;
@@ -2715,7 +2711,6 @@ linux_nat_wait (ptid_t ptid, struct targ
   int options = 0;
   int status = 0;
   pid_t pid = PIDGET (ptid);
-  sigset_t flush_mask;
 
   if (debug_linux_nat_async)
     fprintf_unfiltered (gdb_stdlog, "LLW: enter\n");
@@ -2737,8 +2732,6 @@ linux_nat_wait (ptid_t ptid, struct targ
       set_executing (lp->ptid, 1);
     }
 
-  sigemptyset (&flush_mask);
-
   /* Block events while we're here.  */
   linux_nat_async_events (sigchld_sync);
 
@@ -2948,11 +2941,15 @@ retry:
       if (signo == TARGET_SIGNAL_INT && signal_pass_state (signo) == 0)
 	{
 	  /* If ^C/BREAK is typed at the tty/console, SIGINT gets
-	     forwarded to the entire process group, that is, all LWP's
-	     will receive it.  Since we only want to report it once,
-	     we try to flush it from all LWPs except this one.  */
-	  sigaddset (&flush_mask, SIGINT);
+	     forwarded to the entire process group, that is, all LWPs
+	     will receive it - unless they're using CLONE_THREAD to
+	     share signals.  Since we only want to report it once, we
+	     mark it as ignored for all LWPs except this one.  */
+	  iterate_over_lwps (set_ignore_sigint, NULL);
+	  lp->ignore_sigint = 0;
 	}
+      else
+	maybe_clear_ignore_sigint (lp);
     }
 
   /* This LWP is stopped now.  */
@@ -2969,8 +2966,7 @@ retry:
 
       /* ... and wait until all of them have reported back that
 	 they're no longer running.  */
-      iterate_over_lwps (stop_wait_callback, &flush_mask);
-      iterate_over_lwps (flush_callback, &flush_mask);
+      iterate_over_lwps (stop_wait_callback, NULL);
 
       /* If we're not waiting for a specific LWP, choose an event LWP
 	 from among those that have had events.  Giving equal priority
Index: gdb-only/gdb/linux-nat.h
===================================================================
--- gdb-only.orig/gdb/linux-nat.h	2008-07-26 13:48:56.000000000 -0400
+++ gdb-only/gdb/linux-nat.h	2008-07-26 22:26:30.000000000 -0400
@@ -62,6 +62,9 @@ struct lwp_info
      be the address of a hardware watchpoint.  */
   struct siginfo siginfo;
 
+  /* Non-zero if we expect a duplicated SIGINT.  */
+  int ignore_sigint;
+
   /* If WAITSTATUS->KIND != TARGET_WAITKIND_SPURIOUS, the waitstatus
      for this LWP's last event.  This may correspond to STATUS above,
      or to a local variable in lin_lwp_wait.  */
Index: gdb-only/gdb/testsuite/gdb.threads/manythreads.exp
===================================================================
--- gdb-only.orig/gdb/testsuite/gdb.threads/manythreads.exp	2008-06-28 11:41:11.000000000 -0400
+++ gdb-only/gdb/testsuite/gdb.threads/manythreads.exp	2008-07-26 23:02:13.000000000 -0400
@@ -58,8 +58,11 @@ gdb_test_multiple "continue" "first cont
   }
 }
 
+# Wait one second.  This is better than the TCL "after" command, because
+# we don't lose GDB's output while we do it.
+remote_expect host 1 { timeout { } }
+
 # Send a Ctrl-C and verify that we can do info threads and continue
-after 1000
 send_gdb "\003"
 set message "stop threads 1"
 gdb_test_multiple "" "stop threads 1" {
@@ -110,8 +113,35 @@ gdb_test_multiple "continue" "second con
   }
 }
 
+# Wait another second.  If the program stops on its own, GDB has failed
+# to handle duplicate SIGINTs sent to multiple threads.
+set failed 0
+remote_expect host 1 {
+  -re "\\\[New \[^\]\]*\\\]\r\n" {
+    exp_continue -continue_timer
+  }
+  -re "\\\[\[^\]\]* exited\\\]\r\n" {
+    exp_continue -continue_timer
+  }
+  -re "Thread \[^\n\]* executing\r\n" {
+    exp_continue -continue_timer
+  }
+  -re "Program received signal SIGINT.*$gdb_prompt $" {
+    if { $failed == 0 } {
+      fail "check for duplicate SIGINT"
+    }
+    send_gdb "continue\n"
+    set failed 1
+    exp_continue
+  }
+  timeout {
+    if { $failed == 0 } {
+      pass "check for duplicate SIGINT"
+    }
+  }
+}
+
 # Send another Ctrl-C and verify that we can do info threads and quit
-after 1000
 send_gdb "\003"
 set message "stop threads 2"
 gdb_test_multiple "" "stop threads 2" {
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]