testTerminateKillKILL(frysk.proc.TestTaskTerminateObserver)junit.framework.AssertionFailedError: terminating value expected:<-9> but was:<128> at frysk.proc.TestTaskTerminateObserver.check(TestRunner) at frysk.proc.TestTaskTerminateObserver.terminate(TestRunner) at frysk.proc.TestTaskTerminateObserver.testTerminateKillKILL(TestRunner) at frysk.junit.Runner.runCases(TestRunner) at frysk.junit.Runner.runArchCases(TestRunner) at frysk.junit.Runner.runTestCases(TestRunner) at TestRunner.main(TestRunner)
*** This bug has been marked as a duplicate of 3489 ***
Fixes to make exit47 test case pass, do not fix this bug. Assuming a separate problem and re-splitting.
*** Bug 3640 has been marked as a duplicate of this bug. ***
RHEL https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=226684
What's happening is that in TestTaskTerminateObserver.java, Terminate.updateTerminating(...) is never executed when funit-exit is passed a - Sig.KILL_. Before I go to all the effort of possibly re-chasing an already chased bug, I thought I'd ask about the possibility that this is related to the x-state bug; i.e., is a "terminating" condition related to the transient X state? If so, if you SIGKILL a process, is it possible to miss a "terminating" and go straight to "terminated?" (A little background for Roland, to whom I've cc-ed this: this is a bug where a process being sent a kill(SIGKILL) should trigger a "terminating" observer, but doesn't seem to be doing that.)
The waitpid call should return a reasonable value. For instance: -> some sort of error indication, for instance ESRCH, EINTR, ... -> PID killed with -9 But instead it's getting back that the process exited with 128. Our tests show that exit47, the previous bug, is fixed.
Okay, I'll chase that, but it's not the only possibility. TestTaskTerminateObserver.Terminate initialises int terminating = INVALID (where INVALID = 128), but public Action updateTerminating (...) never runs, leaving terminating = INVALID, which is the proximate cause of the failure. I'll wire up the waitpid() and check if it's causing the problem. The exit47 test does indeed pass.
Ah, so it is seeing no notification at all that the task was terminated?
Yeah, for SIGKILL, updateTerminating never gets hit; updateTerminated does. The pattern of waitpid returns is similar in all test cases (terminate(), terminated(), and terminating()), nothing unexpected, but the first two fail for a lack of a terminating event.
Nice analysis. Can you check that this is an intended behavior change with Roland.
I don't know what's going on in the frysk world, but a couple things from the kernel side might be relevant. First is that for death by SIGKILL you may well not see any EXIT event (WIFSTOPPED, SIGTRAP|PTRACE_EVENT_EXIT<<16). You will see the death event (WIFSIGNALED) for sure except possibly in the case of multi-threaded exec by a non-leader thread (when you won't see a report from the old leader, but the exec'ing thread will change its PID to the leader's). Second is there is a rare race bug in kernels before the recent test kernels, that can produce a bogus wait status value. The bogus value will be a WIFSTOPPED with WSTOPSIG 0 or some high bits set. This is a very unlikely race. Also, it doesn't produce a WIFEXITED value in the bug case, so it doesn't seem likely to be relevant to what you are seeing. Nothing comes to mind with a bogus status of 0x8000. An _exit(128) produces that status.
Created attachment 1653 [details] side by side trace comparison This attachment shows the diagnostic output of two failing tests and one passing test, all of which do a kill(SIGKILL). None of the tests get a "terminating" event--the one that passes does so only because it's not /expecting/ a terminating event. None of the waitpid()s look wrong to me except possibly lines 13 and 14: should there really be two waitpid()s in a row returning WIFSIGNALED(9) on the same task?
Your trace doesn't indicate whether different threads are doing different wait calls or did ptrace calls or forks. If thread A forks a child, and thread B does PTRACE_ATTACH to that child, then on death there is one report "to B" (but available to all threads in the same process calling wait*) and then there is a second one "to A". The second one happens because you are the real parent of the child that is no longer ptrace'd after the ptracer's wait returns WIFSIGNALED/WIFEXITED. The first one happens because you are the ptracer but not the real parent, but there are two of you so all things can be true and false.
Here's what's happening: 1. Wait.cxx:processStatus() decodes the waitpid status and if (WIFSTOPPED (status) && (PTRACE_EVENT_EXIT == WSTOPEVENT (status))) it calls exitEvent() 2. LinuxPtraceHost.PollWaitOnSigChld.exitEvent() calls processTerminatingEvent() 3. Task.processTerminatingEvent() calls .handleTerminatingEvent() 4. LinuxPtraceTaskState.handleTerminatingEvent() calls notifyTerminating() 5. Task.notifyTerminating() calls updateTerminating() 6. TestTaskTerminateObserver. updateTerminating sets the int terminating value. If, in Wait.cxx:processStatus(), status == 9, (KILL), WIFSIGNALED (status) is true rather than WIFSTOPPED (status), so none of the foregoing happens, causing the test to fail. What I don't know is if the process described above is in fact what the programmer who wrote it intended and the test exercises conditions that weren't meant to be exercised, or if the process described is flawed or incomplete.
(In reply to comment #14) > If, in Wait.cxx:processStatus(), status == 9, (KILL), WIFSIGNALED (status) is > true rather than WIFSTOPPED (status), so none of the foregoing happens, causing > the test to fail. What I don't know is if the process described above is in > fact what the programmer who wrote it intended and the test exercises conditions > that weren't meant to be exercised, or if the process described is flawed or > incomplete. The programmer, me, didn't know that the "terminating" event was not guarenteed when the process was killed using -9. Just the testcase needs to be adjusted to be more flexable.
Index: frysk-core/frysk/proc/ChangeLog 2007-07-04 Andrew Cagney <cagney@redhat.com> * TestTaskTerminateObserver.java (check): Remove brokenIfUtraceXXX check for bug 3489. (testTerminateKillKILL, testTerminatingKillKILL): Delete.