This is the mail archive of the
mailing list for the GDB project.
gdb, pthreads, and sleep
- From: Michael Elizabeth Chastain <mec at shout dot net>
- To: gdb at sources dot redhat dot com
- Date: Mon, 22 Sep 2003 16:47:06 -0400
- Subject: gdb, pthreads, and sleep
There's some code in the mi pthreads test that is bugging me. All the
tests pass all the time, but different tests pass on different test
runs, which causes noise in my test reports.
I'm running on native i686-pc-linux-gnu, red hat 8.0, glibc 2.2.93-5-rh.
Here is the code of the program under test:
routine (void *arg)
printf ("hello thread\n");
/* Create a few threads */
for (i = 0; i < 5; i++)
When gdb is not used, "sleep (9)" sleeps for 9 seconds and returns 0.
When gdb is used, "sleep (9)" sleeps for 0 seconds and returns 9.
This causes races and different output on different test runs.
The problem is an interaction between sleep, pthread_create, and gdb.
When gdb is running, pthread_create eventually calls
pthread_restart_new, which sends the pthread_sig_restart signal. gdb
notices this signal. But as a side effect, the "sleep (9)" is
interrupted and returns early.
When gdb is used, usually all the threads go all the way to exit, but
sometimes some threads do not (especially the newest thread created).
The test script gdb.mi/mi-pthreads.exp wants to test -thread-select
on the child threads. The test script lets the threads be created.
Then it asks for thread info, and then it tests -thread-select
on each thread.
The list of threads varies from run to run, so the PASS results vary
from run to run. On 95% of the runs, however, there are no children at
all, so the test script is not covering -thread-select very well
(it still sees the parent thread and the manager thread).
What can we do about this?
(1) Do nothing.
This bugs me because I would like to run the gdb test suite twice in
a row and have it come out the same way each time. This makes it
easier for automated testers and for new people, like gcc people, to
use the test suite. That's my reason for bringing this up.
Also, -thread-select is not testing child processes very well.
(2) Change the program under test to be more correct:
int unslept = 9;
unslept = sleep (unslept);
This is the proper way to call 'sleep' in a program that may
receive signals. The return value of 'sleep' is documented in
Single Unix Spec, v2, so it is portable. If this code leads to a
problem, then it means that the test program has found a bug in the
operating system's implementation of "sleep". Tickling bugs is
a *good* thing for a test program.
The gotcha here is that gdb should work with buggy test programs.
Currently, pthreads.c is written poorly (ignores the return value of
'sleep'), but it's natural that people write code like this.
On the other hand, the point of the pthreads test is to call
thread-select on a lot of threads. With the exising code, there is
a child thread for thread-select less than 10% of the time. By
writing the code my way, thread-select is actually called on the
child threads 100% of the time. (I've tested this on my test
bed with 200 runs each way).
(3) Leave the test program along, but change the test script so that it
generates one PASS result for the whole thread list instead of one PASS
per thread. It would still generate FAILs for each thread that FAILed
thread-select. This would make the results reproducible, and leave the
test program exactly as it is now.
My preferences: (2), (3), (1).
What do you think?
Also, I think we need some documentation in the gdb threads section.
gdb makes some threaded programs behave differently because the signals
for gdb are not perfectly transparent. Watch what happens when I run
gdb.mi/pthreads with and without gdb:
/* without gdb */
/* with gdb */
Starting program: /berman/home/mgnu/gdb/pthread-select/pthreads
[New Thread 8192 (LWP 12564)]
[New Thread 16385 (LWP 12568)]
[New Thread 8194 (LWP 12569)]
[New Thread 16387 (LWP 12570)]
[New Thread 24580 (LWP 12571)]
[New Thread 32773 (LWP 12572)]
[New Thread 40966 (LWP 12573)]
Program exited normally.
The 'hello thread' output happens only under the debugger.
I think people will be surprised when their 'sleep' calls actually
sleep without gdb, but return with a lot of unslept time with gdb.
I doubt we can fix this, although maybe the pthreads implementors
can fix it. But we can least document it.