This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 3/N] remote follow fork and spurious child stops in non-stop mode


On 07/29/2015 02:21 PM, Pedro Alves wrote:
>> > I have a question about this.  In the definition of struct lwp_info in
>> > linux-low.h, it has this comment:
>> > 
>> >   /* When this is true, we shall not try to resume this thread, even
>> >      if last_resume_kind isn't resume_stop.  */
>> >   int suspended;
>> > 
>> > Since we are setting last_resume_kind to resume_stop here, is this
>> > unnecessary?
> We still need it, because otherwise we'd decrement the suspend count
> below 0:

I think that today I fixed most of the issues this uncovered that I showed
yesterday, but there's still more to do, so I removed that bit from the
patch, and pushed it, to both master and 7.10, in order to close PR18600,
and remove it from the 7.10 blockers list.  Not sure yet whether the
other new fixes will make it into 7.10.

Below's what I pushed.

-----------
>From 998d452ac81bc240996c967dd27f7b747240cd66 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Thu, 30 Jul 2015 18:41:44 +0100
Subject: [PATCH 3/3] remote follow fork and spurious child stops in non-stop
 mode

Running gdb.threads/fork-plus-threads.exp against gdbserver in
extended-remote mode, even though the test passes, we still see broken
behavior:

 (gdb) PASS: gdb.threads/fork-plus-threads.exp: set detach-on-fork off
 continue &
 Continuing.
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: continue &
 [New Thread 28092.28092]

 [Thread 28092.28092] #2 stopped.
 [New Thread 28094.28094]
 [Inferior 2 (process 28092) exited normally]
 [New Thread 28094.28105]
 [New Thread 28094.28109]

...

[Thread 28174.28174] #18 stopped.
 [New Thread 28185.28185]
 [Inferior 10 (process 28174) exited normally]
 [New Thread 28185.28196]

 [Thread 28185.28185] #20 stopped.
 Cannot remove breakpoints because program is no longer writable.
 Further execution is probably impossible.
 [Inferior 11 (process 28185) exited normally]
 [Inferior 1 (process 28091) exited normally]
 PASS: gdb.threads/fork-plus-threads.exp: reached breakpoint
 info threads
 No threads.
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: no threads left
 info inferiors
   Num  Description       Executable
 * 1    <null>            /home/pedro/gdb/mygit/build/gdb/testsuite/gdb.threads/fork-plus-threads
 (gdb) PASS: gdb.threads/fork-plus-threads.exp: only inferior 1 left

All the "[Thread FOO] #NN stopped." above are bogus, as well as the
"Cannot remove breakpoints because program is no longer writable.",
which is a consequence.

The problem is that when we intercept a fork event, we should report
the event for the parent, only, and leave the child stopped, but not
report its stop event.  GDB later decides whether to follow the parent
or the child.  But because handle_extended_wait does not set the
child's last_status.kind to TARGET_WAITKIND_STOPPED, a
stop_all_threads/unstop_all_lwps sequence (e.g., from trying to access
memory) by mistake ends up queueing a SIGSTOP on the child, resuming
it, and then when that SIGSTOP is intercepted, because the LWP has
last_resume_kind set to resume_stop, gdbserver reports the stop to
GDB, as GDB_SIGNAL_0:

...
 >>>> entering unstop_all_lwps
 unstopping all lwps
 proceed_one_lwp: lwp 1600
    client wants LWP to remain 1600 stopped
 proceed_one_lwp: lwp 1828
 Client wants LWP 1828 to stop. Making sure it has a SIGSTOP pending
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 Sending sigstop to lwp 1828
 pc is 0x3615ebc7cc
 Resuming lwp 1828 (continue, signal 0, stop expected)
   continue from pc 0x3615ebc7cc
 unstop_all_lwps done
 sigchld_handler
 <<<< exiting unstop_all_lwps
 handling possible target event
 >>>> entering linux_wait_1
 linux_wait_1: [<all threads>]
 my_waitpid (-1, 0x40000001)
 my_waitpid (-1, 0x1): status(137f), 1828
 LWFE: waitpid(-1, ...) returned 1828, ERRNO-OK
 LLW: waitpid 1828 received Stopped (signal) (stopped)
 pc is 0x3615ebc7cc
 Expected stop.
 LLW: resume_stop SIGSTOP caught for LWP 1828.1828.
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
 linux_wait_1 ret = LWP 1828.1828, 1, 0
 <<<< exiting linux_wait_1
 Writing resume reply for LWP 1828.1828:1
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Tested on x86_64 Fedora 20, extended-remote.

gdb/gdbserver/ChangeLog:
2015-07-30  Pedro Alves  <palves@redhat.com>

	* linux-low.c (handle_extended_wait): Set the child's last
	reported status to TARGET_WAITKIND_STOPPED.
---
 gdb/testsuite/ChangeLog                         |  5 ++++
 gdb/gdbserver/linux-low.c                       |  2 ++
 gdb/testsuite/gdb.threads/fork-plus-threads.exp | 32 +++++++++++++++++++++++++
 3 files changed, 39 insertions(+)

diff --git a/gdb/testsuite/ChangeLog b/gdb/testsuite/ChangeLog
index e3126ed..eda6625 100644
--- a/gdb/testsuite/ChangeLog
+++ b/gdb/testsuite/ChangeLog
@@ -1,5 +1,10 @@
 2015-07-30  Pedro Alves  <palves@redhat.com>
 
+	* linux-low.c (handle_extended_wait): Set the child's last
+	reported status to TARGET_WAITKIND_STOPPED.
+
+2015-07-30  Pedro Alves  <palves@redhat.com>
+
 	PR threads/18600
 	* gdb.threads/fork-plus-threads.exp: Test that "info inferiors"
 	only shows inferior 1.
diff --git a/gdb/gdbserver/linux-low.c b/gdb/gdbserver/linux-low.c
index 9bc9fa3..82fb7f9 100644
--- a/gdb/gdbserver/linux-low.c
+++ b/gdb/gdbserver/linux-low.c
@@ -483,6 +483,8 @@ handle_extended_wait (struct lwp_info *event_lwp, int wstat)
 	  child_lwp->status_pending_p = 0;
 	  child_thr = get_lwp_thread (child_lwp);
 	  child_thr->last_resume_kind = resume_stop;
+	  child_thr->last_status.kind = TARGET_WAITKIND_STOPPED;
+
 	  parent_proc = get_thread_process (event_thr);
 	  child_proc->attached = parent_proc->attached;
 	  clone_all_breakpoints (&child_proc->breakpoints,
diff --git a/gdb/testsuite/gdb.threads/fork-plus-threads.exp b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
index 8a503ec..2b34b6c 100644
--- a/gdb/testsuite/gdb.threads/fork-plus-threads.exp
+++ b/gdb/testsuite/gdb.threads/fork-plus-threads.exp
@@ -51,13 +51,45 @@ proc do_test { detach_on_fork } {
 	}
     }
 
+    # gdbserver had a bug that resulted in reporting the fork child's
+    # initial stop to gdb, which gdb does not expect, in turn
+    # resulting in a broken session, like:
+    #
+    #  [Thread 31536.31536] #16 stopped.                                <== BAD
+    #  [New Thread 31547.31547]
+    #  [Inferior 10 (process 31536) exited normally]
+    #  [New Thread 31547.31560]
+    #
+    #  [Thread 31547.31547] #18 stopped.                                <== BAD
+    #  Cannot remove breakpoints because program is no longer writable. <== BAD
+    #  Further execution is probably impossible.                        <== BAD
+    #  [Inferior 11 (process 31547) exited normally]
+    #  [Inferior 1 (process 31454) exited normally]
+    #
+    # These variables track whether we see such broken behavior.
+    set saw_cannot_remove_breakpoints 0
+    set saw_thread_stopped 0
+
     set test "inferior 1 exited"
     gdb_test_multiple "" $test {
+	-re "Cannot remove breakpoints" {
+	    set saw_cannot_remove_breakpoints 1
+	    exp_continue
+	}
+	-re "Thread \[^\r\n\]+ stopped\\." {
+	    set saw_thread_stopped 1
+	    exp_continue
+	}
 	-re "Inferior 1 \(\[^\r\n\]+\) exited normally" {
 	    pass $test
 	}
     }
 
+    gdb_assert !$saw_cannot_remove_breakpoints \
+	"no failure to remove breakpoints"
+    gdb_assert !$saw_thread_stopped \
+	"no spurious thread stop"
+
     gdb_test "info threads" "No threads\." \
 	"no threads left"
 
-- 
1.9.3



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]