This is the mail archive of the
gdb-prs@sources.redhat.com
mailing list for the GDB project.
threads/1446: waitpid may return pid == -1 && errno == EINTR
- From: lonniev at predictableresponse dot com
- To: gdb-gnats at sources dot redhat dot com
- Cc: dgo at microsynetics dot net
- Date: 11 Nov 2003 21:32:48 -0000
- Subject: threads/1446: waitpid may return pid == -1 && errno == EINTR
- Reply-to: lonniev at predictableresponse dot com
>Number: 1446
>Category: threads
>Synopsis: waitpid may return pid == -1 && errno == EINTR
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: unassigned
>State: open
>Class: change-request
>Submitter-Id: net
>Arrival-Date: Tue Nov 11 21:38:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator: Lonnie VanZandt
>Release: insight 6.0
>Organization:
>Environment:
Mandrake 9.1 on 686
>Description:
Without the EINTR check in the code below, gdb will sporadically fail the assert test on pid == GET_LWP(lp->ptid).
I have not determined which signal is unblocked that is interrupting the waitpid routine.
(The other guard I added was a check for a valid ptid value - but that assert does not appear to be failing.)
/* Wait until LP is stopped. If DATA is non-null it is interpreted as
a pointer to a set of signals to be flushed immediately. */
static int
stop_wait_callback (struct lwp_info *lp, void *data)
{
sigset_t *flush_mask = data;
if (!lp->stopped && lp->signalled)
{
pid_t pid;
int status;
gdb_assert (lp->status == 0);
// make sure we have a valid LWP id on which to wait
gdb_assert (GET_LWP (lp->ptid) > 0);
do {
// wait for the particular child process to terminate
pid = waitpid (GET_LWP (lp->ptid), &status, 0);
// if the child is not our child or, more likely, we are SIG_IGN-oring SIGCHLD
if (pid == -1 && errno == ECHILD)
{
// wait for the particular CLONED child process to terminate
pid = waitpid (GET_LWP (lp->ptid), &status, __WCLONE);
// if the wait still fails, then assume that the child no longer exists
if (pid == -1 && errno == ECHILD)
{
/* The thread has previously exited. We need to delete it now
because in the case of nptl threads, there won't be an
exit event unless it is the main thread. */
if (debug_lin_lwp)
fprintf_unfiltered (gdb_stdlog,
"SWC: %s exited.\n",
target_pid_to_str (lp->ptid));
delete_lwp (lp->ptid);
return 0;
}
}
// if the wait failed because we were interrupted by an unblocked signal then retry...
} while (pid == -1 && errno == EINTR);
gdb_assert (pid == GET_LWP (lp->ptid));
>How-To-Repeat:
Not sure: perhaps it is a large number of threads. Perhaps there is external activity.
>Fix:
See workaround in description. Basically, retry the wait if the error is EINTR.
>Release-Note:
>Audit-Trail:
>Unformatted: