Bug 3595 - testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) intermittently hangs
Summary: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) intermittently ...
Status: RESOLVED FIXED
Alias: None
Product: frysk
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Andrew Cagney
URL:
Keywords:
: 3381 (view as bug list)
Depends on: 3381
Blocks: 1496 1582 2654
  Show dependency treegraph
 
Reported: 2006-11-26 22:23 UTC by Andrew Cagney
Modified: 2006-12-15 21:27 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Cagney 2006-11-26 22:23:29 UTC
$ uname -a
Linux nettle 2.6.18-1.2239.fc5 #1 Fri Nov 10 13:04:06 EST 2006 i686 i686 i386
GNU/Linux

This test, when isolated, appears to be hanging during tear-down.  fstack (which
has the side effect of unwedging it) shows the main thread blocked in a waitAll
(i.e., waitpid -1):

#0 0xc92402 in __kernel_vsyscall ()
#1 0xdb9ce1 in __waitpid_nocancel ()
#2 0x80e8719 in _ZGAN5frysk3sys4Wait7waitAllEiPNS0_13Wait$ObserverE () from
../../frysk/frysk-sys/frysk/sys/cni/Wait.cxx
#3 0x809fa2e in _ZN5frysk4proc7TestLib8tearDownEv () from
../../frysk/frysk-core/frysk/proc/TestLib.java
#4 0x810c1b8 in _ZN5junit9framework8TestCase7runBareEv () from
junit/framework/TestCase.java
#5 0x8108dce in _ZN5junit9framework12TestResult$17protectEv () from
junit/framework/TestResult.java
#6 0x810bb9f in
_ZN5junit9framework10TestResult12runProtectedEPNS0_4TestEPNS0_11ProtectableE ()
from junit/framework/TestResult.java
#7 0x810baec in _ZN5junit9framework10TestResult3runEPNS0_8TestCaseE () from
junit/framework/TestResult.java
#8 0x810c166 in _ZN5junit9framework8TestCase3runEPNS0_10TestResultE () from
junit/framework/TestCase.java
#9 0x810a0f2 in
_ZN5junit9framework9TestSuite7runTestEPNS0_4TestEPNS0_10TestResultE () from
junit/framework/TestSuite.java
#10 0x810a0a4 in _ZN5junit9framework9TestSuite3runEPNS0_10TestResultE () from
junit/framework/TestSuite.java
#11 0x810a0f2 in
_ZN5junit9framework9TestSuite7runTestEPNS0_4TestEPNS0_10TestResultE () from
junit/framework/TestSuite.java
#12 0x810a0a4 in _ZN5junit9framework9TestSuite3runEPNS0_10TestResultE () from
junit/framework/TestSuite.java
#13 0x810d20f in _ZN5junit6textui10TestRunner5doRunEPNS_9framework4TestEb ()
from junit/textui/TestRunner.java
#14 0x810d19e in _ZN5junit6textui10TestRunner5doRunEPNS_9framework4TestE () from
junit/textui/TestRunner.java
#15 0x80ec9de in _ZN5frysk5junit6Runner8runCasesEPN4java4util10CollectionE ()
from ../../frysk/frysk-imports/frysk/junit/Runner.java
#16 0x80ece05 in _ZN5frysk5junit6Runner12runArchCasesEPN4java4util10CollectionE
() from ../../frysk/frysk-imports/frysk/junit/Runner.java
#17 0x80ec4c8 in
_ZN5frysk5junit6Runner12runTestCasesEPN4java4lang6StringEPNS2_4util10CollectionES5_S8_S5_
() from ../../frysk/frysk-imports/frysk/junit/Runner.java
#18 0x8085aea in _ZN10TestRunner4mainEP6JArrayIPN4java4lang6StringEE () from
/home/scratch/frysk/native/frysk-core/TestRunner.java
#19 0x2a10934 in _ZN3gnu4java4lang10MainThread9call_mainEv ()
#20 0x2a52b16 in _ZN3gnu4java4lang10MainThread3runEv ()
#21 0x2a1fe0b in _Z13_Jv_ThreadRunPN4java4lang6ThreadE ()
#22 0x29e2868 in _Z11_Jv_RunMainP14_Jv_VMInitArgsPN4java4lang5ClassEPKciPS6_b ()
#23 0x29e29a4 in _Z11_Jv_RunMainPN4java4lang5ClassEPKciPS4_b ()
#24 0x29e29eb in JvRunMain ()
#25 0x8085a21 in main () from /tmp/ccvtZqXd.i
#26 0x1254e4 in __libc_start_main ()
#27 0x8085951 in _start ()

looking at TestRunner in /proc:

cagney@nettle$ pgrep TestRunner
13410
cagney@nettle$ grep `pgrep TestRunner` /proc/*/status
/proc/13410/status:Tgid:        13410
/proc/13410/status:Pid: 13410
cagney@nettle$ ls /proc/`pgrep TestRunner`/task
13410  13411  13416

and then checking for anything TestRunner attached with one of its tasks:

cagney@nettle$ egrep '13410|13411|13416' /proc/*/task/status
egrep: /proc/*/task/status: No such file or directory
cagney@nettle$ egrep '13410|13411|13416' /proc/*/task/*/status
/proc/13410/task/13410/status:Tgid:     13410
/proc/13410/task/13410/status:Pid:      13410
/proc/13410/task/13411/status:Tgid:     13410
/proc/13410/task/13411/status:Pid:      13411
/proc/13410/task/13416/status:Tgid:     13410
/proc/13410/task/13416/status:Pid:      13416
/proc/13413/task/13413/status:TracerPid:        13416
egrep: /proc/self/task/3967/status: No such file or directory
cagney@nettle$ ls /proc/13413/task
13413

shows one process.  Looking at that process:

cagney@nettle$ head /proc/13413/task/13413/status
Name:   funit-child
State:  Z (zombie)
SleepAVG:       98%
Tgid:   13413
Pid:    13413
PPid:   1
TracerPid:      13416
Uid:    500     500     500     500
Gid:    500     500     500     500
FDSize: 0

shows it both a zombie and still attached to TracerPid.

(this /proc probe is like _minutes_ after the hang).
Comment 1 Andrew Cagney 2006-11-26 22:27:39 UTC
Here's a sample tearDown log (for a re-run).  Notice that it is trying to
detach/kill a process with three threads - 13488, 13489, 13490 - while the
clones die the main thread refuses to go:

26-Nov-06 5:11:22 PM frysk.proc.TestLib tearDown
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped)
>>>>>>>>>>>>>>>> start tearDown

26-Nov-06 5:11:22 PM frysk.proc.TestLib log
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped)
testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -KILL 13488

26-Nov-06 5:11:22 PM frysk.proc.TestLib killDuringTearDown
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped)
killDuringTearDown 13488

26-Nov-06 5:11:22 PM frysk.proc.TestLib killDuringTearDown
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped)
killDuringTearDown 13489

26-Nov-06 5:11:22 PM frysk.proc.TestLib killDuringTearDown
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped)
killDuringTearDown 13490

26-Nov-06 5:11:22 PM frysk.proc.TestLib log
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) detach 13490
(failed - ESRCH)

26-Nov-06 5:11:22 PM frysk.proc.TestLib log
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -CONT
13490

26-Nov-06 5:11:22 PM frysk.proc.TestLib log
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped)
testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -KILL 13490

26-Nov-06 5:11:22 PM frysk.proc.TestLib log
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) detach 13489

26-Nov-06 5:11:22 PM frysk.proc.TestLib log
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -CONT
13489 (failed - ESRCH)

26-Nov-06 5:11:22 PM frysk.proc.TestLib log
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -KILL
13489 (failed - ESRCH)

26-Nov-06 5:11:22 PM frysk.proc.TestLib log
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) detach 13488
(failed - ESRCH)

26-Nov-06 5:11:22 PM frysk.proc.TestLib log
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -CONT
13488

26-Nov-06 5:11:22 PM frysk.proc.TestLib log
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped)
testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -KILL 13488

26-Nov-06 5:11:22 PM frysk.proc.TestLib tearDown
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) waitAll -1 ...

26-Nov-06 5:11:22 PM frysk.sys.Wait waitAll
FINE: frysk.sys.Wait pid 13490 status 0x6057f WIFSTOPPED/EXIT 5
(Trace/breakpoint trap)

26-Nov-06 5:11:22 PM frysk.proc.TestLib log
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) detach 13490

26-Nov-06 5:11:22 PM frysk.proc.TestLib log
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -CONT
13490

26-Nov-06 5:11:22 PM frysk.proc.TestLib log
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped)
testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) tkill -KILL 13490

26-Nov-06 5:11:22 PM frysk.proc.TestLib tearDown
FINE: testMultiThreadedStoppedAckDaemon(frysk.proc.TestProcStopped) waitAll -1 ...

Comment 2 Andrew Cagney 2006-11-29 23:18:13 UTC
2006-11-29  Andrew Cagney  <cagney@redhat.com>

        * frysk3595/detach-multi-thread.c (main): Simplify, remove any
        failing system calls - only detach from non-main thread.

        * frysk3595/detach-multi-thread.c (main): Fix for-loop typo,
        attatch to all NR_TASKS.

        * frysk3595/detach-multi-thread.c (main): Reduce number of tasks
        to 1.

        * frysk3595/detach-multi-thread.c: New file.
        * Makefile.am (TESTS, noinst_PROGRAMS): Add
        frysk3595/detach-multi-thread
        (frysk3595_detach_multi_thread_SOURCES)
        (frysk3595_detach_multi_thread_LDFLAGS): Define.

Comment 3 Andrew Cagney 2006-11-29 23:29:07 UTC
Push down stream.
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217809
Comment 4 Andrew Cagney 2006-12-15 21:27:21 UTC
*** Bug 3381 has been marked as a duplicate of this bug. ***