This is the mail archive of the gdb-prs@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

threads/1446: waitpid may return pid == -1 && errno == EINTR


>Number:         1446
>Category:       threads
>Synopsis:       waitpid may return pid == -1 && errno == EINTR
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    unassigned
>State:          open
>Class:          change-request
>Submitter-Id:   net
>Arrival-Date:   Tue Nov 11 21:38:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     Lonnie VanZandt
>Release:        insight 6.0
>Organization:
>Environment:
Mandrake 9.1 on 686
>Description:
Without the EINTR check in the code below, gdb will sporadically fail the assert test on pid == GET_LWP(lp->ptid).

I have not determined which signal is unblocked that is interrupting the waitpid routine.

(The other guard I added was a check for a valid ptid value - but that assert does not appear to be failing.)

/* Wait until LP is stopped.  If DATA is non-null it is interpreted as
   a pointer to a set of signals to be flushed immediately.  */

static int
stop_wait_callback (struct lwp_info *lp, void *data)
{
  sigset_t *flush_mask = data;

  if (!lp->stopped && lp->signalled)
    {
      pid_t pid;
      int status;

      gdb_assert (lp->status == 0);

      // make sure we have a valid LWP id on which to wait
      gdb_assert (GET_LWP (lp->ptid) > 0);

      do {

      // wait for the particular child process to terminate
      pid = waitpid (GET_LWP (lp->ptid), &status, 0);

      // if the child is not our child or, more likely, we are SIG_IGN-oring SIGCHLD
      if (pid == -1 && errno == ECHILD)
	{
	  // wait for the particular CLONED child process to terminate
	  pid = waitpid (GET_LWP (lp->ptid), &status, __WCLONE);

	  // if the wait still fails, then assume that the child no longer exists
	  if (pid == -1 && errno == ECHILD)
	    {
	      /* The thread has previously exited.  We need to delete it now
	         because in the case of nptl threads, there won't be an
	         exit event unless it is the main thread.  */
	      if (debug_lin_lwp)
		fprintf_unfiltered (gdb_stdlog,
				    "SWC: %s exited.\n",
				    target_pid_to_str (lp->ptid));
	      delete_lwp (lp->ptid);
	      return 0;
	    }
	}

      // if the wait failed because we were interrupted by an unblocked signal then retry...
      } while (pid == -1 && errno == EINTR);

      gdb_assert (pid == GET_LWP (lp->ptid));
>How-To-Repeat:
Not sure: perhaps it is a large number of threads. Perhaps there is external activity.
>Fix:
See workaround in description. Basically, retry the wait if the error is EINTR.
>Release-Note:
>Audit-Trail:
>Unformatted:


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]