This is the mail archive of the
archer@sourceware.org
mailing list for the Archer project.
Re: safe PTRACE_ATTACH
- From: Oleg Nesterov <oleg at redhat dot com>
- To: Jan Kratochvil <jan dot kratochvil at redhat dot com>
- Cc: Roland McGrath <roland at redhat dot com>, archer at sourceware dot org
- Date: Wed, 23 Feb 2011 18:16:10 +0100
- Subject: Re: safe PTRACE_ATTACH
- References: <20101115190537.GA15725@redhat.com> <20110215204148.GA17258@host1.dyn.jankratochvil.net> <20110215215438.CBD0E1806E0@magilla.sf.frob.com> <20110216214423.GA22228@redhat.com> <20110216220541.55E701802A2@magilla.sf.frob.com> <20110217211225.GA17768@redhat.com> <20110221193927.122901814AE@magilla.sf.frob.com> <20110222203834.GA6977@redhat.com> <20110223155135.GB30477@host1.dyn.jankratochvil.net>
On 02/23, Jan Kratochvil wrote:
>
> notice: Moved thread to the Archer list.
>
> I can confirm this problem exists.
>
> AFAIK on recent kernels this whole "trick" (if-stopped then tkill(SIGSTOP) and
> PTRACE_CONT(0)) is not needed as it now works even for `eaten-out SIGSTOP
> notifications'.
It is still needed, but the reason is quite different. See the test-case
in http://marc.info/?l=linux-kernel&m=129676623323195
The previous reason for this bug was fixed a long ago. IOW, it is still
needed in the unlikely case.
But this is easy to fix (although the simple fix is not clean), and then
this trick is not needed.
> But to be compatible with the older kernels (despite having this race there)
> what do you suggest? Checking /proc/version seems too fragile to me.
> GDB could do another ptrace test (like linux_test_for_tracesysgood etc.).
Oh, I do not know what would be the best check. But anyway this is
"easy", I mean we can do thi somehow.
The problem is, I do not see how we can modify the kernel and do not
break the unmodified gdb.
Oh. You know, gdb looks completely broken when it comes to jctl signals ;)
Like the kernel. At least in all-stop mode.
This is because... I don't know how to explain, please see the example.
Absolutely trivial test-case:
void *tf(void *arg)
{
for (;;)
pause();
}
int main(void)
{
pthread_t pt;
pthread_create(&pt, NULL, tf, NULL);
tf(NULL);
return 0;
}
Now,
GNU gdb (GDB) 7.1
...
(gdb) attach 29412
Attaching to program: /tmp/0/mt, process 29412
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 0x41b54950 (LWP 29413)]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x00000033af60e57d in pause () from /lib64/libpthread.so.0
(gdb) c
Continuing.
lets send SIGSTOP to 29067: $ kill -STOP 29067
Program received signal SIGSTOP, Stopped (signal).
0x00000033af60e57d in pause () from /lib64/libpthread.so.0
(gdb)
very nice, but what gdb does?
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, 0x7ffffab89b4c, WNOHANG|__WCLONE, NULL) = 0
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], WNOHANG, NULL) = 29412
tkill(29412, SIG_0) = 0
tkill(29413, SIGSTOP) = 0
wait4(29413, 0x7ffffab898b4, 0, NULL) = -1 ECHILD (No child processes)
wait4(29413, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WCLONE, NULL) = 29413
Note this tkill(SIGSTOP) to sub-thread!
Now,
(gdb) c
Continuing.
Program received signal SIGSTOP, Stopped (signal).
0x00000033af60e57d in pause () from /lib64/libpthread.so.0
(gdb) c
Continuing.
Program received signal SIGSTOP, Stopped (signal).
[Switching to Thread 0x41b54950 (LWP 29413)]
0x00000033af60e57d in pause () from /lib64/libpthread.so.0
(gdb) c
Continuing.
Program received signal SIGSTOP, Stopped (signal).
[Switching to Thread 0x7f00007be6f0 (LWP 29412)]
0x00000033af60e57d in pause () from /lib64/libpthread.so.0
(gdb) c
Continuing.
Program received signal SIGSTOP, Stopped (signal).
[Switching to Thread 0x41b54950 (LWP 29413)]
0x00000033af60e57d in pause () from /lib64/libpthread.so.0
(gdb)
and so on forever. every time it does
ptrace(PTRACE_CONT, 29413, 0x1, SIG_0) = 0
ptrace(PTRACE_CONT, 29412, 0x1, SIGSTOP) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], WNOHANG|__WCLONE, NULL) = 29413
tkill(29413, SIG_0) = 0
tkill(29412, SIGSTOP) = 0
wait4(29412, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], 0, NULL) = 29412
with the obvious result.
"signal SIGSTOP" (instead of "c") does work not too by the same reason.
Oleg.