$ uname -a Linux nettle 2.6.18-1.2239.fc5 #1 Fri Nov 10 13:04:06 EST 2006 i686 i686 i386 GNU/Linux This test, when isolated, appears to be hanging during tear-down. fstack (which has the side effect of unwedging it) shows the main thread blocked in a waitAll (i.e., waitpid -1): #0 0xc92402 in __kernel_vsyscall () #1 0xdb9ce1 in __waitpid_nocancel () #2 0x80e8719 in _ZGAN5frysk3sys4Wait7waitAllEiPNS0_13Wait$ObserverE () from ../../frysk/frysk-sys/frysk/sys/cni/Wait.cxx #3 0x809fa2e in _ZN5frysk4proc7TestLib8tearDownEv () from ../../frysk/frysk-core/frysk/proc/TestLib.java #4 0x810c1b8 in _ZN5junit9framework8TestCase7runBareEv () from junit/framework/TestCase.java #5 0x8108dce in _ZN5junit9framework12TestResult$17protectEv () from junit/framework/TestResult.java #6 0x810bb9f in _ZN5junit9framework10TestResult12runProtectedEPNS0_4TestEPNS0_11ProtectableE () from junit/framework/TestResult.java #7 0x810baec in _ZN5junit9framework10TestResult3runEPNS0_8TestCaseE () from junit/framework/TestResult.java #8 0x810c166 in _ZN5junit9framework8TestCase3runEPNS0_10TestResultE () from junit/framework/TestCase.java #9 0x810a0f2 in _ZN5junit9framework9TestSuite7runTestEPNS0_4TestEPNS0_10TestResultE () from junit/framework/TestSuite.java #10 0x810a0a4 in _ZN5junit9framework9TestSuite3runEPNS0_10TestResultE () from junit/framework/TestSuite.java #11 0x810a0f2 in _ZN5junit9framework9TestSuite7runTestEPNS0_4TestEPNS0_10TestResultE () from junit/framework/TestSuite.java #12 0x810a0a4 in _ZN5junit9framework9TestSuite3runEPNS0_10TestResultE () from junit/framework/TestSuite.java #13 0x810d20f in _ZN5junit6textui10TestRunner5doRunEPNS_9framework4TestEb () from junit/textui/TestRunner.java #14 0x810d19e in _ZN5junit6textui10TestRunner5doRunEPNS_9framework4TestE () from junit/textui/TestRunner.java #15 0x80ec9de in _ZN5frysk5junit6Runner8runCasesEPN4java4util10CollectionE () from ../../frysk/frysk-imports/frysk/junit/Runner.java #16 0x80ece05 in _ZN5frysk5junit6Runner12runArchCasesEPN4java4util10CollectionE () from ../../frysk/frysk-imports/frysk/junit/Runner.java #17 0x80ec4c8 in _ZN5frysk5junit6Runner12runTestCasesEPN4java4lang6StringEPNS2_4util10CollectionES5_S8_S5_ () from ../../frysk/frysk-imports/frysk/junit/Runner.java #18 0x8085aea in _ZN10TestRunner4mainEP6JArrayIPN4java4lang6StringEE () from /home/scratch/frysk/native/frysk-core/TestRunner.java #19 0x2a10934 in _ZN3gnu4java4lang10MainThread9call_mainEv () #20 0x2a52b16 in _ZN3gnu4java4lang10MainThread3runEv () #21 0x2a1fe0b in _Z13_Jv_ThreadRunPN4java4lang6ThreadE () #22 0x29e2868 in _Z11_Jv_RunMainP14_Jv_VMInitArgsPN4java4lang5ClassEPKciPS6_b () #23 0x29e29a4 in _Z11_Jv_RunMainPN4java4lang5ClassEPKciPS4_b () #24 0x29e29eb in JvRunMain () #25 0x8085a21 in main () from /tmp/ccvtZqXd.i #26 0x1254e4 in __libc_start_main () #27 0x8085951 in _start () looking at TestRunner in /proc: cagney@nettle$ pgrep TestRunner 13410 cagney@nettle$ grep `pgrep TestRunner` /proc/*/status /proc/13410/status:Tgid: 13410 /proc/13410/status:Pid: 13410 cagney@nettle$ ls /proc/`pgrep TestRunner`/task 13410 13411 13416 and then checking for anything TestRunner attached with one of its tasks: cagney@nettle$ egrep '13410|13411|13416' /proc/*/task/status egrep: /proc/*/task/status: No such file or directory cagney@nettle$ egrep '13410|13411|13416' /proc/*/task/*/status /proc/13410/task/13410/status:Tgid: 13410 /proc/13410/task/13410/status:Pid: 13410 /proc/13410/task/13411/status:Tgid: 13410 /proc/13410/task/13411/status:Pid: 13411 /proc/13410/task/13416/status:Tgid: 13410 /proc/13410/task/13416/status:Pid: 13416 /proc/13413/task/13413/status:TracerPid: 13416 egrep: /proc/self/task/3967/status: No such file or directory cagney@nettle$ ls /proc/13413/task 13413 shows one process. Looking at that process: cagney@nettle$ head /proc/13413/task/13413/status Name: funit-child State: Z (zombie) SleepAVG: 98% Tgid: 13413 Pid: 13413 PPid: 1 TracerPid: 13416 Uid: 500 500 500 500 Gid: 500 500 500 500 FDSize: 0 shows it both a zombie and still attached to TracerPid. (this /proc probe is like _minutes_ after the hang).
Here's a sample tearDown log (for a re-run). Notice that it is trying to detach/kill a process with three threads - 13488, 13489, 13490 - while the clones die the main thread refuses to go: 26-Nov-06 5:11:22 PM frysk.proc.TestLib tearDown FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) >>>>>>>>>>>>>>>> start tearDown 26-Nov-06 5:11:22 PM frysk.proc.TestLib log FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -KILL 13488 26-Nov-06 5:11:22 PM frysk.proc.TestLib killDuringTearDown FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) killDuringTearDown 13488 26-Nov-06 5:11:22 PM frysk.proc.TestLib killDuringTearDown FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) killDuringTearDown 13489 26-Nov-06 5:11:22 PM frysk.proc.TestLib killDuringTearDown FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) killDuringTearDown 13490 26-Nov-06 5:11:22 PM frysk.proc.TestLib log FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) detach 13490 (failed - ESRCH) 26-Nov-06 5:11:22 PM frysk.proc.TestLib log FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -CONT 13490 26-Nov-06 5:11:22 PM frysk.proc.TestLib log FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -KILL 13490 26-Nov-06 5:11:22 PM frysk.proc.TestLib log FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) detach 13489 26-Nov-06 5:11:22 PM frysk.proc.TestLib log FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -CONT 13489 (failed - ESRCH) 26-Nov-06 5:11:22 PM frysk.proc.TestLib log FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -KILL 13489 (failed - ESRCH) 26-Nov-06 5:11:22 PM frysk.proc.TestLib log FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) detach 13488 (failed - ESRCH) 26-Nov-06 5:11:22 PM frysk.proc.TestLib log FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -CONT 13488 26-Nov-06 5:11:22 PM frysk.proc.TestLib log FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -KILL 13488 26-Nov-06 5:11:22 PM frysk.proc.TestLib tearDown FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) waitAll -1 ... 26-Nov-06 5:11:22 PM frysk.sys.Wait waitAll FINE: frysk.sys.Wait pid 13490 status 0x6057f WIFSTOPPED/EXIT 5 (Trace/breakpoint trap) 26-Nov-06 5:11:22 PM frysk.proc.TestLib log FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) detach 13490 26-Nov-06 5:11:22 PM frysk.proc.TestLib log FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -CONT 13490 26-Nov-06 5:11:22 PM frysk.proc.TestLib log FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -KILL 13490 26-Nov-06 5:11:22 PM frysk.proc.TestLib tearDown FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) waitAll -1 ...
2006-11-29 Andrew Cagney <cagney@redhat.com> * frysk3595/detach-multi-thread.c (main): Simplify, remove any failing system calls - only detach from non-main thread. * frysk3595/detach-multi-thread.c (main): Fix for-loop typo, attatch to all NR_TASKS. * frysk3595/detach-multi-thread.c (main): Reduce number of tasks to 1. * frysk3595/detach-multi-thread.c: New file. * Makefile.am (TESTS, noinst_PROGRAMS): Add frysk3595/detach-multi-thread (frysk3595_detach_multi_thread_SOURCES) (frysk3595_detach_multi_thread_LDFLAGS): Define.
Push down stream. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217809
*** Bug 3381 has been marked as a duplicate of this bug. ***