This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [PATCH v2] GDBserver crashes when killing a multi-thread process
- From: Pedro Alves <palves at redhat dot com>
- To: Yao Qi <qiyaoltc at gmail dot com>, gdb-patches ml <gdb-patches at sourceware dot org>
- Date: Mon, 13 Jul 2015 18:32:22 +0100
- Subject: Re: [PATCH v2] GDBserver crashes when killing a multi-thread process
- Authentication-results: sourceware.org; auth=none
- References: <510f2362-8d33-4c3c-9a13-5d187f26abdf at SVR-ORW-FEM-04 dot mgc dot mentorg dot com> <53AF87EB dot 60703 at mentor dot com> <53B3CBDB dot 5030207 at redhat dot com> <53BEAE5E dot 7030209 at redhat dot com> <55A3E23C dot 8020101 at gmail dot com>
On 07/13/2015 05:07 PM, Yao Qi wrote:
> Hi Pedro,
> do you still remember why did you add this assert? It wasn't
> mentioned in the mail
> https://sourceware.org/ml/gdb-patches/2014-07/msg00206.html
>
Simply because getting here was supposed to indicate
something went wrong elsewhere, but at the time I didn't consider
that the child could die while ptrace-stopped.
> I am looking at a GDBserver internal error on x86_64 when I run
> gdb.threads/thread-unwindonsignal.exp with GDBserver,
>
> continue^M
> Continuing.^M
> warning: Remote failure reply: E.No unwaited-for children left.^M
> PC register is not available^M
> (gdb) FAIL: gdb.threads/thread-unwindonsignal.exp: continue until exit
> Remote debugging from host 127.0.0.1^M
> ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M
> ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M
> ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M
> ptrace(regsets_fetch_inferior_registers) PID=30700: No such process^M
> monitor exit^M
> Killing process(es): 30694^M
> (gdb) /home/yao/SourceCode/gnu/gdb/git/gdb/gdbserver/linux-low.c:1106: A
> problem internal to GDBserver has been detected.^M
> kill_wait_lwp: Assertion `res > 0' failed.
>
> After your patch https://sourceware.org/ml/gdb-patches/2015-03/msg00597.html
> GDBserver starts to swallows errors if the LWP is gone. Then, when
> GDBservers kills non-exist LWP, the assert will be triggered.
>
Looks like I forgot to push the rest of that series:
https://sourceware.org/ml/gdb-patches/2015-03/msg00182.html
What do you think of that one?
> Why don't we implement kill_wait_lwp like its counterpart in GDB
> linux-nat.c:kill_wait_callback? we can loop and assert like this
> patch below, (note that this patch fixes the internal error, and
> the FAIL is still there).
>
Seems to me it's not 100% correct to waitpid the pid one more time
after we've already reaped it, because there's a minuscule chance
another process that we're debugging could clone a new lwp that reuses
the PID of the one we've just killed/reaped, and then another iteration
could collect the initial SIGSTOP of the wrong LWP and we'd kill it:
-> kill (pid1, SIGKILL);
<- waitpid (pid1) returns pid1/WSIGNALLED
-> on iteration1: new pid1 clone lwp is spawned
-> ret==pid1, continue iterating
-> kill (pid1, SIGKILL); // killing wrong process
<- waitpid (pid1) returns either SIGSTOP or WSIGNALLED
...
Thanks,
Pedro Alves