This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v2 1/5] Extended-remote follow exec

From: Pedro Alves <palves at redhat dot com>
To: Don Breazeal <donb at codesourcery dot com>, gdb-patches at sourceware dot org
Date: Thu, 13 Aug 2015 15:50:19 +0100
Subject: Re: [PATCH v2 1/5] Extended-remote follow exec
Authentication-results: sourceware.org; auth=none
References: <1436996979-32350-1-git-send-email-donb at codesourcery dot com> <1438298360-29594-1-git-send-email-donb at codesourcery dot com> <1438298360-29594-2-git-send-email-donb at codesourcery dot com>

Hi Don,

Starting to look at this.  See questions/comments below.


>      The second issue is that gdbserver must clean up any stale thread/lwp
>      structures before it eventually tries to stop all the threads.  If it
>      doesn't, it will hang in sigsuspend, waiting for an event from a
>      non-existent thread.
>    
>      For all-stop mode We do this by checking the return value from
>      kill_lwp in send_sigstop, eventually called after calling
>      stop_all_lwps, which is used to stop the threads in all-stop mode.  If
>      kill_lwp returns an error and errno is ESRCH, we know that the lwp
>      with that pid is gone, and we delete the associated data structures.
> 
> -  kill_lwp (pid, SIGSTOP);
> +  errno = 0;
> +  ret = kill_lwp (pid, SIGSTOP);
> +  if (ret == -1 && errno == ESRCH)
> +    {
> +      /* If the kill fails with "No such process", on GNU/Linux we know
> +	 that the LWP has vanished - it is not a zombie, it is gone.
> +	 This is because a thread that was not the thread group leader
> +	 called exec and took over the leader's lwp.  */
> +      delete_lwp (lwp);
> +      set_desired_thread (0);

I can't see how this fixes the issue completely.  A thread may well exec
just _after_ you did a successfully did kill_lwp(pid, SIGSTOP), and then
we'll still hang waiting for pid.  Seems to me this must be handled on the
wait side.  Alternatively, how about just requiring Linux >= 3.0 for this
feature, and retrieve the ID of the thread that execed with
PTRACE_GETEVENTMSG?

> +  else if (event == PTRACE_EVENT_EXEC && report_exec_events)
> +    {
> +      struct regcache *regcache;
> +
> +      if (debug_threads)
> +	{
> +	  debug_printf ("HEW: Got exec event from LWP %ld\n",
> +			lwpid_of (event_thr));
> +	}
> +
> +      /* If the exec was not called by the thread group leader, then
> +	 the lwp_info and thread_info structures are out-of-date,
> +	 containing information about the original leader thread and
> +	 not the new exec'ing leader thread.  Invalidate the register
> +	 cache without flushing it to the target.  */
> +      regcache = (struct regcache *) inferior_regcache_data (event_thr);

The cast is no longer necessary.  inferior_regcache_data now returns
a struct regcache * already.

> +      free_register_cache (regcache);
> +      set_inferior_regcache_data (event_thr, NULL);
> +
> +      /* The new executable may be for a different architecture than
> +	 that of the execing process, so re-initialize the architecture.
> +	 The call to get_pc will refill the register cache.  */
> +      linux_arch_setup_thread (event_thr);
> +      event_lwp->stop_pc = get_pc (event_lwp);
> +
> +      event_lwp->waitstatus.kind = TARGET_WAITKIND_EXECD;
> +      event_lwp->waitstatus.value.execd_pathname
> +	= xstrdup (linux_proc_pid_to_exec_file (lwpid_of (event_thr)));
> +
> +      /* Mark the exec status as pending.  */
> +      event_lwp->stopped = 1;
> +      event_lwp->status_pending_p = 1;
> +      event_lwp->status_pending = wstat;
> +      event_thr->last_resume_kind = resume_stop;

Shouldn't this be resume_continue?

> +      event_thr->last_status.kind = TARGET_WAITKIND_IGNORE;
> +


>  
> @@ -2056,6 +2124,38 @@ linux_low_filter_event (int lwpid, int wstat)
>  
>    child = find_lwp_pid (pid_to_ptid (lwpid));
>  
> +  /* Check for stop events reported by a process we didn't already
> +     know about - anything not already in our LWP list.
> +
> +     If we're expecting to receive stopped processes after
> +     fork, vfork, and clone events, then we'll just add the
> +     new one to our list and go back to waiting for the event
> +     to be reported - the stopped process might be returned
> +     from waitpid before or after the event is.
> +
> +     But note the case of a non-leader thread exec'ing after the
> +     leader having exited, and gone from our lists (because
> +     check_zombie_leaders deleted it).  The non-leader thread
> +     changes its tid to the tgid.  */
> +
> +  if (WIFSTOPPED (wstat) && (child == NULL) && (WSTOPSIG (wstat) == SIGTRAP)
> +      && (linux_ptrace_get_extended_event (wstat) == PTRACE_EVENT_EXEC))

Please remove the redundant parenthesis.

> @@ -1133,6 +1134,25 @@ prepare_resume_reply (char *buf, ptid_t ptid,
>  	    buf = write_ptid (buf, status->value.related_pid);
>  	    strcat (buf, ";");
>  	  }
> +	else if ((status->kind == TARGET_WAITKIND_EXECD) && multi_process)

More unnecessary parens.

> +	  {
> +	    enum gdb_signal signal = GDB_SIGNAL_TRAP;
> +	    const char *event = "exec";
> +	    char hexified_pathname[PATH_MAX];

PATH_MAX would be the max size if it weren't for hexification.
An hexified string can occupy double that.

>    /* We've followed the inferior through an exec.  Therefore, the
> diff --git a/gdb/nat/linux-ptrace.c b/gdb/nat/linux-ptrace.c
> index f097c8a..235dfba 100644
> --- a/gdb/nat/linux-ptrace.c
> +++ b/gdb/nat/linux-ptrace.c
> @@ -528,9 +528,7 @@ ptrace_supports_feature (int ptrace_options)
>  }
>  
>  /* Returns non-zero if PTRACE_EVENT_FORK is supported by ptrace,
> -   0 otherwise.  Note that if PTRACE_EVENT_FORK is supported so is
> -   PTRACE_EVENT_CLONE, PTRACE_EVENT_EXEC and PTRACE_EVENT_VFORK,
> -   since they were all added to the kernel at the same time.  */
> +   0 otherwise.  */

Why remove this info?

>  
>        q = reconcat (q, "qSupported:", q, (char *) NULL);
> @@ -5934,6 +5942,31 @@ Packet: '%s'\n"),
>  	      event->ws.kind = TARGET_WAITKIND_VFORK_DONE;
>  	      p = skip_to_semicolon (p1 + 1);
>  	    }
> +	  else if (strncmp (p, "exec", p1 - p) == 0)
> +	    {
> +	      ULONGEST pid;
> +	      char pathname[PATH_MAX];
> +
> +	      p = unpack_varlen_hex (++p1, &pid);
> +
> +	      /* Save the pathname for event reporting and for
> +		 the next run command.  */
> +	      hex2bin (p1, (gdb_byte *) pathname, (p - p1)/2);

Missing spaces around /.

> +	      /* Add the null terminator.  */
> +	      pathname[(p - p1)/2] = '\0';

Ditto.

> +	      /* This is freed during event handling.  */
> +	      event->ws.value.execd_pathname = xstrdup (pathname);
> +	      event->ws.kind = TARGET_WAITKIND_EXECD;
> +	      /* Save the pathname for the next run command.  */
> +	      xfree (remote_exec_file);
> +	      remote_exec_file = xstrdup (pathname);
> +	      /* Reset the architecture in case the new executable is
> +		 different from the execing executable.  We need to
> +		 do this right now, before any register access, to
> +		 keep the client and remote architectures in sync.  */
> +	      target_clear_description ();

Each inferior has a description.  Is there anything that makes sure this
is clearing the description of the right inferior?

> +	      exec_file_attach (remote_exec_file, 0);

Ditto.  It seems wrong to do this here -- infrun.c:follow_exec
will do this, and more.  E.g., this will cause breakpoints to be
reset, but we haven't yet marked the old locations as wiped/uninserted.
infrun.c:follow_exec does that (the mark_breakpoints_out call).
Another example, this doesn't handle "set follow-exec-mode new".
Seems like we need to find another way to handle the issue you saw.
What were the register accesses you saw?  Is that the expedited registers
parsing just below, or something else?

Thanks,
Pedro Alves

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]