This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: Possible regression on PPC64 testsuite with native-{extended-}gdbserver
- From: Pedro Alves <palves at redhat dot com>
- To: Sergio Durigan Junior <sergiodj at redhat dot com>
- Cc: gdb-patches at sourceware dot org, "Yichun Zhang (agentzh)" <agentzh at gmail dot com>, Edjunior Barbosa Machado <emachado at linux dot vnet dot ibm dot com>
- Date: Thu, 5 May 2016 12:22:30 +0100
- Subject: Re: Possible regression on PPC64 testsuite with native-{extended-}gdbserver
- Authentication-results: sourceware.org; auth=none
- References: <1460479589-21126-1-git-send-email-palves at redhat dot com> <87r3dhjokq dot fsf at redhat dot com>
On 05/05/2016 02:11 AM, Sergio Durigan Junior wrote:
> As I said, this problem doesn't happen when we're not testing gdbserver
> configurations.
>
> I haven't investigated the problem further, and it may very well be
> something unrelated to this patch (notice that, although the failure
> happens several times, it's not deterministic), but I decided it was
> a good thing to raise awareness.
So far, this looks unrelated to the patch in question to me.
Testing on gcc112 on the compile farm, (POWER8/PPC64, Fedora 21), I
do see the testsuite hanging too. The whole testsuite runs, but then
a couple tests hang forever, it seems. ps shows:
palves 72739 72012 0 10:31 pts/49 00:00:00 expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.threads/process-dies-while-handling-bp gdb.threads/process-dies-while-handling-bp.exp --target_board=native-gdbserver
palves 157333 156965 0 10:30 pts/49 00:00:00 expect -- /usr/share/dejagnu/runtest.exp --status GDB_PARALLEL=yes --outdir=outputs/gdb.base/cond-expr gdb.base/cond-expr.exp --target_board=native-gdbserver
So indeed one of them is gdb.threads/process-dies-while-handling-bp.exp.
Attaching to the gdbserver process, we see it stuck here:
(gdb) bt
#0 0x00003fff96b5ddf4 in sigsuspend () from /lib64/libc.so.6
#1 0x0000000010049fa0 in linux_wait_for_event_filtered (wait_ptid=..., filter_ptid=..., wstatp=0x3fffc5d76608, options=1073741824)
at ../../../src/gdb/gdbserver/linux-low.c:2709
#2 0x000000001004d208 in wait_for_sigstop () at ../../../src/gdb/gdbserver/linux-low.c:3904
#3 0x000000001004d97c in stop_all_lwps (suspend=0, except=0x0) at ../../../src/gdb/gdbserver/linux-low.c:4041
#4 0x00000000100466d8 in linux_kill (pid=82121) at ../../../src/gdb/gdbserver/linux-low.c:1345
#5 0x0000000010021024 in kill_inferior (pid=82121) at ../../../src/gdb/gdbserver/target.c:326
#6 0x000000001001baf4 in detach_or_kill_inferior_callback (entry=0x100335f45c0) at ../../../src/gdb/gdbserver/server.c:3388
#7 0x0000000010008f24 in for_each_inferior (list=0x100b79e8 <all_processes>, action=0x1001ba54 <detach_or_kill_inferior_callback(inferior_list_entry*)>)
at ../../../src/gdb/gdbserver/inferiors.c:55
#8 0x000000001001bdcc in detach_or_kill_for_exit () at ../../../src/gdb/gdbserver/server.c:3449
#9 0x000000001001be2c in detach_or_kill_for_exit_cleanup (ignore=0x0) at ../../../src/gdb/gdbserver/server.c:3463
#10 0x0000000010040814 in do_my_cleanups (pmy_chain=0x100b0490 <cleanup_chain>, old_chain=0x10075310 <sentinel_cleanup>)
at ../../../src/gdb/gdbserver/../common/cleanups.c:154
#11 0x0000000010040918 in do_cleanups (old_chain=0x10075310 <sentinel_cleanup>) at ../../../src/gdb/gdbserver/../common/cleanups.c:176
#12 0x000000001004144c in throw_exception_cxx (exception=...) at ../../../src/gdb/gdbserver/../common/common-exceptions.c:289
#13 0x0000000010041584 in throw_exception (exception=...) at ../../../src/gdb/gdbserver/../common/common-exceptions.c:317
#14 0x0000000010041778 in throw_it (reason=RETURN_QUIT, error=GDB_NO_ERROR, fmt=0x1006d820 "Quit", ap=0x3fffc5d76bf8 " D\327\305\377?")
at ../../../src/gdb/gdbserver/../common/common-exceptions.c:373
#15 0x0000000010041810 in throw_vquit (fmt=0x1006d820 "Quit", ap=0x3fffc5d76bf8 " D\327\305\377?") at ../../../src/gdb/gdbserver/../common/common-exceptions.c:385
#16 0x00000000100418dc in throw_quit (fmt=0x1006d820 "Quit") at ../../../src/gdb/gdbserver/../common/common-exceptions.c:404
#17 0x000000001001cf04 in captured_main (argc=4, argv=0x3fffc5d77178) at ../../../src/gdb/gdbserver/server.c:3790
#18 0x000000001001cf94 in main (argc=4, argv=0x3fffc5d77178) at ../../../src/gdb/gdbserver/server.c:3804
(gdb)
So gdbserver was quitting, and was trying to kill all child
processes along with it, and then it hangs. Process 82121, the
one gdb server was trying to kill is actually dead already:
[palves@gcc2-power8 src]$ cat /proc/82121/status | grep State
State: Z (zombie)
That we mishandle the case of the process dying unexpectedly is
already known and it manifests in several different ways, so
that's not much surprising. That's exactly the point of that
test in the first place.
What _is_ surprising is that the testsuite framework doesn't
time out eventually...
I attached to the corresponding gdb process, and I see
that we're stuck in a loop sending "monitor exit" to gdbserver,
in rcmd:
10292 if (getpkt_sane (&rs->buf, &rs->buf_size, 0) == -1)
10293 {
10294 /* Timeout. Continue to (try to) read responses.
10295 This is better than stopping with an error, assuming the stub
10296 is still executing the (long) monitor command.
10297 If needed, the user can interrupt gdb using C-c, obtaining
10298 an effect similar to stop on timeout. */
10299 continue;
10300 }
I mean, getpkt_sane is constantly timing out. We can step through the code
and see that:
(gdb) finish
Run till exit from #0 do_ser_base_readchar (scb=0x1002eb57cc0, timeout=1) at ../../src/gdb/ser-base.c:345
0x0000000010068940 in generic_readchar (scb=0x1002eb57cc0, timeout=2, do_readchar=0x10068640 <do_ser_base_readchar(serial*, int)>) at ../../src/gdb/ser-base.c:424
424 ch = do_readchar (scb, timeout);
Value returned is $1 = -2
(gdb) finish
Run till exit from #0 0x0000000010068940 in generic_readchar (scb=0x1002eb57cc0, timeout=2, do_readchar=0x10068640 <do_ser_base_readchar(serial*, int)>)
at ../../src/gdb/ser-base.c:424
0x0000000010068a2c in ser_base_readchar (scb=0x1002eb57cc0, timeout=2) at ../../src/gdb/ser-base.c:451
451 return generic_readchar (scb, timeout, do_ser_base_readchar);
Value returned is $2 = -2
(gdb) finish
...
While this code is debatable too, I think that expect/runtest/dejagnu
itself should be timing out, and then force-killing gdb anyway.
The still-running log shows:
$ tail testsuite/outputs/gdb.threads/process-dies-while-handling-bp/gdb.log
(gdb) PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: set breakpoint that evals false
continue &
Continuing.
(gdb) PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: continue &
KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (timeout) (PRMS: gdb/18749)
Remote debugging from host 127.0.0.1
gdbserver: reading register 0: No such process
Killing process(es): 82121
monitor exit
Ignoring packet error, continuing...
$
So all is self consistent.
I just don't understand why doesn't dejagnu timeout.
I think the "monitor exit" is the one from
gdb/testsuite/lib/gdbserver-support.exp:gdb_exit.
The other hung test is "gdb.base/cond-expr.exp". This one's more
mysterious. Attaching to the gdb in question, I really see nothing. gdb
is not debugging anything, and is just waiting for input.
However, from:
$ tail testsuite/outputs/gdb.base/cond-expr/gdb.log
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
/home/palves/gdb/build/gdb/testsuite/../../gdb/gdb version 7.11.50.20160505-git -nw -nx -data-directory /home/palves/gdb/build/gdb/testsuite/../data-directory -ex "set auto-connect-native-target off"
runtest completed at Thu May 5 10:31:20 2016
$
we see that that test did complete. So I can't really explain that...
So in sum, we may have gdb or gdbserver bugs, but the framework
should be timing out and coping anyway. Why isn't it? I have no
clue atm.
Thanks,
Pedro Alves