This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Stand resume() on its head


On Tue, Nov 05, 2002 at 02:15:15PM -0800, Michael Snyder wrote:
> Daniel Jacobowitz wrote:
> > 
> > On Tue, Nov 05, 2002 at 03:28:19PM -0500, Andrew Cagney wrote:
> > > Hello,
> > >
> > > There have now been several discussion threads that lead to the
> > > conclusion that
> > >
> > >       target->resume (ptid_t, int, enum target_signal)
> > >
> > > needs changing.  At present the suggestion is to add a parameter to
> > > indicate schedule locking and similar operations.
> > >
> > > I'd like to propose a different approach.  Instead of passing to
> > > resume() what to do, have resume() iterate over all the threads asking
> > > each what it should do - suspend, step, run, signal, ...
> > >
> > > I think, in the end, GDB will need to do something like this any way
> > > (how else is GDB going to handle suspended threads?) so might as well
> > > start earlier rather than later :-)
> > 
> > I like it, roughly speaking.  I've got a couple of other thoughts and
> > some questions:
> >  - What do you mean by suspended threads?
> 
> Just what you think -- give the user the ability to say
> "this thread should not resume when the others do."

Oh, OK.  So, a desirable feature but one that we don't have right now.

> >  - User interface for this? 
> 
> Important, and yet to be worked out.  Wanna start the discussion?

Sure, below.

> > We could use this opportunity to fix
> > and clarify passing signals.  A command to show pending signals
> > per-thread for the next resume; a command to set them.
> 
> Hmmmm!
> 
> >  - Why would we want to step a particular thread in a resume?  If we
> > want to single-step a particular thread then it seems to me that we
> > want to do it independently of resuming other threads.
> 
> Currently that's true.  I can't think of a circumstance where it
> wouldn't be true, but I haven't thought real hard about it.

As I was writing the previous copy of this paragraph I realized: you
mark one thread as single-stepping, and other threads as suspended or
resuming, in order to single-step with or without schedlock.  Duh.

> >  - Is there a useful way to combine this with a mechanism to report
> > more than one event from a wait?  More than one thread stopping with a
> > signal, for instance.  That'll also need interface changes, but we need
> > the interface changes anyway: see the failing test for hitting a
> > watchpoint and a breakpoint at the same time, in annota2.exp.
> 
> In a single-processor system, I don't think that can happen.
> It's bogus that Linux-gdb lets it appear to happen (at least internally).
> But yeah, it can sure happen in a multi-processor environment.
> Have any thoughts to share about that interface?

Well, I don't remember the exact details of that testcase (my last
explanation's in the archive somewhere, since we never reached a
consensus on how to handle it), but an easier-to-explain testcase
works like this:
  0x10 store A 
  0x15 other instruction

Place breakpoint on 0x10, place another on 0x15.  Use software
watchpoints (not sure this is necessary, given how we remove things). 
Place a write watchpoint on A.  Continue to 0x10. Single step.

We remove breakpoints, step to 0x15, check for stop causes, and we
find:
  - stopped because we were single stepping.  This just causes a stop,
silent.
  - stopped because we hit breakpoint at 0x15.  Mention it.
  - stopped because we hit watchpoint on A.  We end up not mentioning
it; I don't recall why...

There are plenty of easier ways to have multiple stop statuses in a
multithreaded app, of course.

On Tue, Nov 05, 2002 at 06:10:02PM -0500, Andrew Cagney wrote:
> In the case of shlib, all the other threads would already being in the 
> running state (so would need no action).  Just one stopped thread would 
> need a stepi.

Gotcha.

> > - Is there a useful way to combine this with a mechanism to report
> >more than one event from a wait?  More than one thread stopping with a
> >signal, for instance.  That'll also need interface changes, but we need
> >the interface changes anyway: see the failing test for hitting a
> >watchpoint and a breakpoint at the same time, in annota2.exp.
> 
> I think we'll need that anyway.  But hopefully independentish - resume 
> can be implemented independant of the wait side.

Probably, but I'd like to stop and design for a little bit before we do
either.



Random thoughts on this topic:

Waiting
=======
One thread gets an event.  We stop all threads.  This is, to be blunt,
awful.  There's at least two places where we will just need to resume
again and can do everything we need to do with other threads running:
  - shared library events
  - thread creation/death events
And there's:
  - thread-specific breakpoints hit by the wrong thread
but as per our last conversation about this it's not clear we can step
over the breakpoint safely without stopping other threads first; blech;
but there may someday be a way (on some friendly OS) to do this.  Or to
place proper thread-specific breakpoints.

[Completely random aside: have you looked at the output of help catch
lately?  I'm almost ready to submit catch fork/vfork/exec.  Generic
catch start is easy and sorely lacking.  Catch throw/catch are on my
todo list.  Catch exit is trickier and may require another kernel patch
to implement on Linux.  Catch load/unload would be really nice to have. 
Catch thread_start/exit/join would be kind of nice, too.  How many of
those do we implement right now on non-HP/UX?  Not many.]

So it seems to me that we have the interface in the wrong place. 
target_wait should return a list of thread statuses; for platforms with
an efficient stop-all-threads, or for platforms not converted to this
new model, they would all be stopped.  But some could instead be
running.  Then we add a target_stop_thread or
target_stop_running_threads hook, and call it from the appropriate
locations (i.e. if nothing hooks the breakpoint type and handles it).
We can step over these breakpoints safely because they are protected by
mutexes in the inferior; we have context-specific knowledge of this.

We don't need to update all of GDB to be aware of this; we could
guarantee the inferior stopped by the exit to w_f_i, for instance.

Interface considerations: if multiple threads stop on their own, we
need to report this to the user.  Easy enough in CLI, probably doable
in MI... right now we hackily preserve the other threads' stops for
later, which is gross.

Resuming
========

The advantages of the sort of resume that Andrew described are that we
could suspend an arbitrary number of threads, pass signals to multiple
threads simultaneously, etc.  Sounds good.

One concern is how to express this to targets which speak the current
remote protocol.  Is it time to design a new remote protocol so that
modern and thread-aware stubs can be more powerful?  Particularly
returning the one-thread-stopped events (should we at all?  Real
latency issues.  What happens if another thread stops while waiting for
GDB to tell us what to do?  Save the event for later?).  And a more
robust mechanism for communicating I/O to the inferior, and reporting
events without stopping the inferior (e.g. new thread created events).

Interface
=========

Maybe something like this in the CLI:
(gdb) thread status
* 1024 (LWP 650) - Nothing pending, stopped after single step
  2049 (LWP 651) - Nothing pending, stopped by GDB
  3072 (LWP 652) - pending SIGUSR1, stopped by SIGUSR1
  4096 (LWP 653) - Nothing pending, suspended by user
  5120 (LWP 654) - Nothing pending, stopped by SIGUSR2
(gdb) thread suspend 5120
  5120 (LWP 654) - Nothing pending, suspended by user
(gdb) thread suspend 3072
  3072 (LWP 652) - pending SIGUSR1, suspended by user
(gdb) thread unsuspend 4096
  4096 (LWP 653) - Nothing pending, stopped by GDB		(?)
(gdb) thread queue 4096 SIGUSR1					(??)
  4096 (LWP 653) - pending SIGUSR1, stopped by GDB
(gdb) thread queue 3072 0					(???)
  3072 (LWP 652) - Nothing pending, stopped by SIGUSR1

(?): Save the original stop reason?
(??): Don't really like "queue" but I'm making this up as I go along.
(???): Separate "unqueue"?


-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]