When you single step a SIGTRAP handler with ptrace() then it gets reset on some kernels. This happens at least on 2.6.19-1.2895.fc6, but not on 2.6.17-1.2174_FC5. It also doesn't happen when doing a normal ptrace() CONT through the signal handler.
Pushed to Fedora: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227693
*** Bug 4019 has been marked as a duplicate of this bug. ***
Marking as suspended, test case added.
Note that "upstream" (fedora kernel maintainers in this case) said: "Happens on vanilla 2.6.18.6 from kernel.org, too" "Does not happen on 2.6.16.35" So it seems an (old!) upstream, upstream (kernel.org) bug really.
Just as a bit of a blog, and as notes to myself, here's what's happening so far: Presumably (I haven't checked yet, so it's "presumably") as a result of the ptrace (PTRACE_SINGLESTEP, pid, 0, SIGTRAP); in the testcase, kernel/utrace.c:utrace_signal_handler_singlestep() is called. Something in there (again, I haven't followed that path yet) results in a call to arch/i386/kernel/traps.c:do_debug() which calls arch/i386/kernel/ptrace.c:send_sigtrap(SIGTRAP,...) which calls kernel/signal.c:force_sig_info() which then sets action->sa.sa_handler = SIG_DFL; if the current action is blocked--the handler up to that point was correctly pointing at the testcase handler; A comment in kernel/signal.c reads: /* * Force a signal that the process can't ignore: if necessary * we unblock the signal and change any SIG_IGN to SIG_DFL. * * Note: If we unblock the signal, we always reset it to SIG_DFL, * since we do not want to have a signal handler that was blocked * be invoked when user space had explicitly blocked it. * * We don't want to have recursive SIGSEGV's etc, for example. */ so I guess the behaviour is deliberate. It will take me more poking to figure out what, if anything, should be done about this. I'm going to guess though that since PTRACE_SINGLESTEP results in the child looking like it's been stopped by a SIGTRAP, and in the testcase a non-SIG_DFL handler is being set by the child on SIGTRAP, there's a bit of confusion.
According to a comment by roland on https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=227693 this isn't a bug, but expected behaviour of ptrace single stepping into sig trap handler (so, I assume it was a bug that this worked on older kernels). Since stepping into a sig trap handler will produce a sig trap signal itself (because that is how ptrace reports the single step action) and the kernel cannot rely on there being a debugger/parent swallowing that second sig trap signal. Note that single stepping into any other signal handler doesn't have this problem. So we will have to come up with a trick to (simulate?) single stepping into a sig trap handler. Leaving this open for now.
This is a misfeature of ptrace single step. It uses SIGTRAP to signal that a step is made. this used to work in older kernels. But newer kernels decided to block the sig trap handler if the child wasn't using a reentrant sigtrap handler (even though the ptracing debugger would of course swallow the signal and never deliver it to the child itself). Resetting the child signal handler obviously breaks out testcases. For now, to have minimal testing of sigtrap handler stepping, we instrument the test programs to us SA_NODEFER. Also the funit-breakpoints uses a simple SIGUSER handler to test signal stepping and breakpointing. The real solution for this problem, so we can single step also non-altered user programs that use SIGTRAP, is to use a, non-existing yet, interface on top of utrace that doesn't use SIGTRAP for reporting events to frysk.