This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug testsuite/20600] New: parallet testsuite hang in [nd_]syscall.exp


https://sourceware.org/bugzilla/show_bug.cgi?id=20600

            Bug ID: 20600
           Summary: parallet testsuite hang in [nd_]syscall.exp
           Product: systemtap
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: testsuite
          Assignee: systemtap at sourceware dot org
          Reporter: dsmith at redhat dot com
  Target Milestone: ---

When I run the testsuite in parallel mode with at lest 3 concurrent jobs, I'm
getting a testsuite "hang". The testsuite will run to completion, except for
either the syscall.exp or nd_syscall.exp test case. That test case will hang in
one of the tests, typically in the execve or getrlimit subtest. The stapio
process for that test is in the defunct state:

====
# ps ax | fgrep stap
14534 pts/0    S+     0:00 grep -F --color=auto stap
24933 ?        Zl     0:10 [stapio] <defunct>

# tail testsuite/artifacts/systemtap.syscall/nd_syscall/systemtap.log 
Executing on host: gcc /root/src/testsuite/systemtap.syscall/getpriority.c 
-lrt  -lm   -o
/root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/staptestgbSi0f/getpriority
   (timeout = 300)
spawn -ignore SIGHUP gcc /root/src/testsuite/systemtap.syscall/getpriority.c
-lrt -lm -o
/root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/staptestgbSi0f/getpriority
PASS: 64-bit getpriority nd_syscall
Testing 64-bit getrandom nd_syscall
Executing on host: gcc /root/src/testsuite/systemtap.syscall/getrandom.c  -lrt 
-lm   -o
/root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/staptest9QHupy/getrandom
   (timeout = 300)
spawn -ignore SIGHUP gcc /root/src/testsuite/systemtap.syscall/getrandom.c -lrt
-lm -o
/root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/staptest9QHupy/getrandom
PASS: 64-bit getrandom nd_syscall
Testing 64-bit getrlimit nd_syscall
Executing on host: gcc /root/src/testsuite/systemtap.syscall/getrlimit.c  -lrt 
-lm   -o
/root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/staptest4a2xe9/getrlimit
   (timeout = 300)
spawn -ignore SIGHUP gcc /root/src/testsuite/systemtap.syscall/getrlimit.c -lrt
-lm -o
/root/rhel7-ppc64le/testsuite/artifacts/systemtap.syscall/nd_syscall/staptest4a2xe9/getrlimit

# ll testsuite/artifacts/systemtap.syscall/nd_syscall/systemtap.log 
-rwxr-xr-x. 1 root root 21289 Sep 10 01:19
testsuite/artifacts/systemtap.syscall/nd_syscall/systemtap.lo
====

So, for over 9 hours that test has just sat there. If I do a 'kill -9' on that
defunct stapio process, the [nd_]syscall.exp test will finish (and the full
testsuite will also finish).

Note that on the same system the full testsuite (and the [nd_]syscall.exp test
cases) will run to completion when run in non-parallel mode.

This "hang" is fairly repeatable, happening at least 50% of the time.

I'd guess that one of the other tests is interfering with the [nd_]syscall.exp
test case somehow, but I can't quite think of how.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]