This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
RE: architecture paper draft

From: "Chen, Brad" <brad dot chen at intel dot com>
To: "Richard J Moore" <richardj_moore at uk dot ibm dot com>
Cc: <systemtap at sources dot redhat dot com>
Date: Fri, 11 Feb 2005 18:26:53 -0800
Subject: RE: architecture paper draft
[appologies if you already saw this; sources.redhat.com sent me a
send failure notice.]


This was very helpful history about dprobes motivation. Thanks.
I agree with your observation that debugging and performance tools
have different needs. A few comments.

1) You might have a look at kerninst from University of Wisconsin. 
They use branches when they can and traps when they can't.

2a) 
- What if we just let the instrumentation do its thing anyway?
How many cases are there where it is undesireable to commit the 
results of the script before the faulting instruction is launched? 
It seems to me that if the analysis is at the semantic level of 
procedures or source lines then it's okay if the instrumentation 
commits. Especially if we make it clear that the analysis occurs
before the instruction is executed. 
- When it does matter: if we replace the instrumented instruction 
with a branch, and it generates a trap, then the trap handler 
might recognize that the instruction is in SystemTAP memory and 
know to do something special, such as schedule some kind of fix-up, 
or trigger undo code in the script via a speculation language 
feature.

2b) Recursion - this we want to strictly disallow, right?

3) Part of what I took away from the notion of probe points is 
that the instrumentation is placed not at arbitrary locations 
but at very specific locations. Do we want people to be able 
to put instrumentation at arbitrary places? Seems like this 
could be a safety problem.

Brad

-----Original Message-----
From: Richard J Moore [mailto:richardj_moore@uk.ibm.com] 
Sent: Friday, February 11, 2005 1:38 AM
To: Chen, Brad
Cc: Frank Ch. Eigler; Stephen C. Tweedie; systemtap@sources.redhat.com;
systemtap-owner@sources.redhat.com
Subject: RE: architecture paper draft

The original design choice for an interrupt mechanism rather than a
branch
was based upon the following criteria:

1) for a global debugger - i.e. where breakpoints/probepoints can be
placed
in user and kernel space - then we need run the probe handler in kernel
context to give maximum access to system resources. So a privilege level
transfer to ring 0 is mandated.

2a) The probed instruction is single-stepped before normal control
returns
to the system. This is done for dynamic tracing purposes, where we
discard
the trace record if the probed instruction faults (not traps). If we
don't
do this we get multiple trace events for an apparent single execution of
a
given instruction where a page-fault it generates is handled seamlessly
by
the memory manager. There is an option to override this behaviour BTW.
Single-stepping of the probed instruction has to be done in the correct
context, hence for simplicity we temporarily restored the original
instruction and single-stepped it in situ. However, that scheme opens
the
windows for missing potential tracepoints in a multi-processor
environment.
Hence the later change to kprobes where we single-step a copy of the
original instruction. To implement that change we store the original
instruction in memory that is accessible by the same virtual address
from
all contexts - remember this is a global debugger by design, it doesn't
privatize code as ptrace does; manipulation of the probed instruction is
done by an aliased virtual address in kernel space. A probepoint on a
shared library is active for all contexts - current and future -  that
call
that library.

2b) If the probed instruction causes recursion into the probe handler
then
we silently remove the probepoint. We also provide an explicit means to
do
this from the probe handler (so satisfy various needs).  Thus while it's
not valid to put a probe in the code path of the probe handler, it does
no
harm.

3) Both 2a and 2b require the ability to instate and remove probepoints
in
arbitrary contexts. We can't afford to have to deal with special locking
requirements or the possibility of causing a fault on storing the probe
instruction. Therefore we chose an instruction that could both be stored
atomically and cause a transition to ring 0. There are very few that do
this - in fact I think there's only one on IA32, which is the INT3.

4) In order to preserve order (for tracing purposes) we also required
that
the breakpoint interrupt be serviced by in interrupt gate and not a trap
gate - the latter doesn't atomically disable interrupts.


So, that's how we got into using the interrupt mechanism for
probepoints. I
believe it's still valid when kprobes/dprobes is used as a global
debugger.
And I guess this is where the requirements of profiling and performance
tools differ.  The debugger's prime concern is to record  order of
events
and is less concerned about timing. The perftool is concerned with
accurate
timing of and sampling and requires minimal disturbance to normal
performance characteristics but is not concerned with recording the
detailed sequence of events. Hence the preference by Sun to base the
performance probe on a call.

Have we come to a parting of the ways? Is kprobes the right mechanism on
which to build a DTRACE-like capability?



- -
Richard J Moore
IBM Advanced Linux Response Team - Linux Technology Centre
MOBEX: 264807; Mobile (+44) (0)7739-875237
Office: (+44) (0)1962-817072

systemtap-owner@sources.redhat.com wrote on 11/02/2005 01:16:19:

>
> Frank Ch. Engler wrote:
> > In addition, this method may require that the kprobes handler not be
> > started from an interrupt context wrapped around the "int 3" trap
> (x86).
> > Changing this might require extensive changes to kprobes, to perhaps
> > insert "simple" diversionary branches into the executable image
> instead
> > of traps.  Intel folks prefer this sort of approach for performance
> > reasons, but we may have come across an even better reason for it.
>
> Thank you for noting my earlier question about interrupt overhead.
> I said I would do a little homework on interrupt overhead; here it is:
>    Cycle delay by CPU   Branch   Trap
>     1.6 GHz Pentium 4   149      1408
>      AMD Athalon 1800    38      361
>     1.6 GHz Pentium M   84      541
>
> These numbers are from the kerninst team from the University of
> Wisconsin
> and I did not verify them myself. In general it looks like a trap is
> 7-10x
> more expensive than a branch. It appears to me that kprobes requires
> three
> traps, so that would make the overall impact 20-30x more expensive.
>
> For Example: Assume a 1.6GHz Pentium 4
>    Branch overhead: 149 cycles
>    Overhead for one trap: about 1400 cycles
>    Kprobes requires 2-3 traps
>    1% overhead => 16M cycles
>    trap-based instrumentation: 5000 probes per second
>    branch-based instrumentation: 94000 probes per second
>
> For many tools, most time will be spent in analysis code and this
> issue is irrelevant. However, if you happen to be a performance
> guy, and you're trying to do something even moderately aggressive
> in terms of higher frequency or very low overhead, this might start
> to matter. If this also helps to simplify some of the interrupt
> management issues, that's great.
>
> I note in passing that the SPARC implementation of DTrace is
> reported to use branches, and their x86 implementation uses
> traps.
>
> Brad Chen
>
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]