Sourceware Bugzilla – Bug 3639
Last modified: 2011-03-16 21:19:56 UTC
terminating value expected:<-9> but was:<128>
*** This bug has been marked as a duplicate of 3489 ***
Fixes to make exit47 test case pass, do not fix this bug. Assuming a separate
problem and re-splitting.
*** Bug 3640 has been marked as a duplicate of this bug. ***
What's happening is that in TestTaskTerminateObserver.java,
Terminate.updateTerminating(...) is never executed when funit-exit is passed a -
Sig.KILL_. Before I go to all the effort of possibly re-chasing an already
chased bug, I thought I'd ask about the possibility that this is related to the
x-state bug; i.e., is a "terminating" condition related to the transient X
state? If so, if you SIGKILL a process, is it possible to miss a "terminating"
and go straight to "terminated?"
(A little background for Roland, to whom I've cc-ed this: this is a bug where a
process being sent a kill(SIGKILL) should trigger a "terminating" observer, but
doesn't seem to be doing that.)
The waitpid call should return a reasonable value. For instance:
-> some sort of error indication, for instance ESRCH, EINTR, ...
-> PID killed with -9
But instead it's getting back that the process exited with 128.
Our tests show that exit47, the previous bug, is fixed.
Okay, I'll chase that, but it's not the only possibility.
TestTaskTerminateObserver.Terminate initialises int terminating = INVALID (where
INVALID = 128), but public Action updateTerminating (...) never runs, leaving
terminating = INVALID, which is the proximate cause of the failure. I'll wire
up the waitpid() and check if it's causing the problem.
The exit47 test does indeed pass.
Ah, so it is seeing no notification at all that the task was terminated?
Yeah, for SIGKILL, updateTerminating never gets hit; updateTerminated does. The
pattern of waitpid returns is similar in all test cases (terminate(),
terminated(), and terminating()), nothing unexpected, but the first two fail for
a lack of a terminating event.
Can you check that this is an intended behavior change with Roland.
I don't know what's going on in the frysk world, but a couple things from the
kernel side might be relevant.
First is that for death by SIGKILL you may well not see any EXIT event
You will see the death event (WIFSIGNALED) for sure except possibly in the case
of multi-threaded exec by a non-leader thread (when you won't see a report from
the old leader, but the exec'ing thread will change its PID to the leader's).
Second is there is a rare race bug in kernels before the recent test kernels,
that can produce a bogus wait status value. The bogus value will be a
WIFSTOPPED with WSTOPSIG 0 or some high bits set. This is a very unlikely race.
Also, it doesn't produce a WIFEXITED value in the bug case, so it doesn't seem
likely to be relevant to what you are seeing.
Nothing comes to mind with a bogus status of 0x8000. An _exit(128) produces
Created attachment 1653 [details]
side by side trace comparison
This attachment shows the diagnostic output of two failing tests and one
passing test, all of which do a kill(SIGKILL). None of the tests get a
"terminating" event--the one that passes does so only because it's not
/expecting/ a terminating event.
None of the waitpid()s look wrong to me except possibly lines 13 and 14: should
there really be two waitpid()s in a row returning WIFSIGNALED(9) on the same
Your trace doesn't indicate whether different threads are doing different wait
calls or did ptrace calls or forks. If thread A forks a child, and thread B
does PTRACE_ATTACH to that child, then on death there is one report "to B" (but
available to all threads in the same process calling wait*) and then there is a
second one "to A". The second one happens because you are the real parent of
the child that is no longer ptrace'd after the ptracer's wait returns
WIFSIGNALED/WIFEXITED. The first one happens because you are the ptracer but
not the real parent, but there are two of you so all things can be true and false.
Here's what's happening:
1. Wait.cxx:processStatus() decodes the waitpid status and if (WIFSTOPPED
(status) && (PTRACE_EVENT_EXIT == WSTOPEVENT (status))) it calls exitEvent()
2. LinuxPtraceHost.PollWaitOnSigChld.exitEvent() calls processTerminatingEvent()
3. Task.processTerminatingEvent() calls .handleTerminatingEvent()
4. LinuxPtraceTaskState.handleTerminatingEvent() calls notifyTerminating()
5. Task.notifyTerminating() calls updateTerminating()
6. TestTaskTerminateObserver. updateTerminating sets the int terminating value.
If, in Wait.cxx:processStatus(), status == 9, (KILL), WIFSIGNALED (status) is
true rather than WIFSTOPPED (status), so none of the foregoing happens, causing
the test to fail. What I don't know is if the process described above is in
fact what the programmer who wrote it intended and the test exercises conditions
that weren't meant to be exercised, or if the process described is flawed or
(In reply to comment #14)
> If, in Wait.cxx:processStatus(), status == 9, (KILL), WIFSIGNALED (status) is
> true rather than WIFSTOPPED (status), so none of the foregoing happens, causing
> the test to fail. What I don't know is if the process described above is in
> fact what the programmer who wrote it intended and the test exercises conditions
> that weren't meant to be exercised, or if the process described is flawed or
The programmer, me, didn't know that the "terminating" event was not guarenteed
when the process was killed using -9. Just the testcase needs to be adjusted to
be more flexable.
2007-07-04 Andrew Cagney <firstname.lastname@example.org>
* TestTaskTerminateObserver.java (check): Remove brokenIfUtraceXXX
check for bug 3489.
(testTerminateKillKILL, testTerminatingKillKILL): Delete.