This is the mail archive of the frysk@sourceware.org mailing list for the frysk project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Optimizing watchpoints

From: Mark Wielaard <mark at klomp dot org>
To: Phil Muldoon <pmuldoon at redhat dot com>
Cc: Roland McGrath <roland at redhat dot com>, Frysk Hackers <frysk at sourceware dot org>
Date: Wed, 10 Oct 2007 12:28:42 +0200
Subject: Re: Optimizing watchpoints
References: <46FD7036.2010500@redhat.com> <20071001012529.D264A4D0325@magilla.localdomain> <470C7B26.1000307@redhat.com>

On Wed, 2007-10-10 at 08:11 +0100, Phil Muldoon wrote:
> Roland McGrath wrote:
> > For the latter, that means an individual thread or a group of threads that
> > share a set of watchpoints.  Right now, the implementation can only be done
> > by setting each watchpoint individually on each thread.  But it is likely
> > that future facilities will be able to share some low-level resources and
> > interface overhead by treating uniformly an arbitrary subset of threads in
> > the same address space.  
> 
> Ideally from an api perspective, I'd like both. In the past, I always 
> found it useful to watch every thread in a process to see which one was 
> clobbering this memory address. However I would still like to preserve 
> single thread watchpoints from a user (Frysk) api perspective.

You can of course simulate one with the other on the frysk.proc Observer
level. It is a good idea to keep the performance in mind when offering
options to observe on a single Task or whole Proc level. But even if the
underlying kernel/hardware interface only offers one, you can/should
offer the other (either by setting a watchpoint for each Task in a set,
or by filtering out watchpoints events for Tasks set on a Proc level
that the user isn't interested in). In fact we made the mistake with
Code observers to let them always trigger on a Proc wide basis (since
the underlying mechanism works by setting software breakpoints which are
always triggered for all Tasks in the Proc) even if the were registered
for only one Tasks. http://sourceware.org/bugzilla/show_bug.cgi?id=4895
(See also the earlier Task vs Proc wide Observers discussions earlier on
the list.)

> Right now (correct me if I wrong here Mark), we do "software" code 
> breakpoints via single-stepping and none of the limited debug registers 
> are used for hardware code breakpoints.

Yes you are right, we only do "software" breakpoints, not "hardware"
breakpoints at the moment. We use breakpoint instruction insertion into
the code stream and when those are hit we continue past the breakpoint
by either simulating the instruction (not fully implemented), placing a
copy of the instruction "out-of-line", step that and fixup any registers
(only done for a few instructions we know about, needs a full
instruction parser to be completed) or by placing back ("resetting") the
original instruction, step the Task and place the breakpoint instruction
back (this is fully implemented, but risks missing the breakpoint in
other running Tasks - gdb works around that by temporarily stopping the
world and only then do the reset-step-one-task dance).

We really should also support hardware based breakpoints since they are
way more efficient. But as you say they are a limited resource, whether
or not (and how) to expose that to the user on the frysk.proc Observer
level or just fall back to a less efficient (software based) breakpoint
is an open question.

> > There is one final aspect of organization to consider.  At the lowest
> > level, there is a fixed-size hardware resource of watchpoint slots.  When
> > you set them with ptrace, the operating system just context-switches them
> > for each thread in the most straightforward way.  So the hardware resource
> > is yours to decide how to allocate.  However, this is not what we expect to
> > see in future facilities.  The coming model is that hardware watchpoints
> > are a shared resource managed and virtualized to a certain degree by the
> > operating system.  The debugger may be one among several noncooperating
> > users of this resource, for both per-thread and system-wide uses.  Rather
> > than having the hardware slots to allocate as you choose, you will specify
> > what you want in a slot, and a priority, and can get dynamic feedback about
> > the availability of a slot for your priority.

This is interesting. Do you also forsee that threads of a process that
share the same processor can more easily share their breakpoints? That
is, could the debugger indicate that it would like to change the task
cpu-affinity for that?

> This is where I see the largest change in Frysk's implementation now, 
> and where it will change in the future with utrace; and it would do to 
> make this setting and getting stuff in a fairly abstract class that can 
> be reslotted depending on implementation. This is where I have been 
> currently spending a lot of my thinking time. Right now, the debug 
> registers will be populated via Frysk's register access routines which 
> are themselves being refactored. The ptrace peek and poke is abstracted 
> from the code, and just a simple set/get will be performed via the Frysk 
> functions to populate and read the debug registers. But as you mention, 
> it appears in the utrace world that this will be taken from the 
> (abstracted) ptrace user and managed by the kernel. For the purposes of 
> context on this list, is that hardware watchpoint design set in stone 
> with utrace now, and would it be safe to lay plans based on that?

Are you and Chris working together on the utrace abstraction layer? Or
is the frysk-utrace completely separate from this effort?

> > At one extreme you have single-step, i.e. software watchpoints by storing
> > the old value, stepping an instruction, and checking if the value in memory
> > changed.  This has few constraints on specification (only that you can't
> > distinguish stored-same from no-store, and it's not a mechanism for data
> > read breakpoints).  It has no resource contention issues at all.

And it would seem to be the only option if you want to watch values
stored in registers it seems.

>   It is
> > inordinately expensive in CPU time (though a straightforward in-kernel
> > implementation could easily be orders of magnitude faster than the
> > traditional debugger experience of implementing this).

A shared in-kernel breakpoint/watchpoint framework with for example the
systemtap project would be ideal!

> Conceptually (again correct me if I am wrong again, Mark/Tim) this is 
> what we do with Code breakpoints, so adding a software watchpoint would 
> be a modification of that code, and the hardware watchpoints - at least 
> at the engine level - would be separate implementation.

Yes, although it is currently abstracted at the frysk.proc.Instruction
level. Each Instruction knows whether it can either be simulated,
stepped-out-of-line or needs to be reset-putback in the original
instruction stream to continue past a breakpoint. It shouldn't be too
hard to abstract it on the frysk.proc.Breakpoint level however (I did
that before, by there were too many unknowns to come up with a good
abstract design without knowing what actual implementations would look
like).

Cheers,

Mark

References:
- Re: Optimizing watchpoints
  - From: Roland McGrath
- Re: Optimizing watchpoints
  - From: Phil Muldoon

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]