This is the mail archive of the
systemtap@sources.redhat.com
mailing list for the systemtap project.
RE: architecture paper draft
- From: "Chen, Brad" <brad dot chen at intel dot com>
- To: "Frank Ch. Eigler" <fche at redhat dot com>, "Stephen C. Tweedie" <sct at redhat dot com>
- Cc: <systemtap at sources dot redhat dot com>
- Date: Thu, 10 Feb 2005 17:16:19 -0800
- Subject: RE: architecture paper draft
Frank Ch. Engler wrote:
> In addition, this method may require that the kprobes handler not be
> started from an interrupt context wrapped around the "int 3" trap
(x86).
> Changing this might require extensive changes to kprobes, to perhaps
> insert "simple" diversionary branches into the executable image
instead
> of traps. Intel folks prefer this sort of approach for performance
> reasons, but we may have come across an even better reason for it.
Thank you for noting my earlier question about interrupt overhead.
I said I would do a little homework on interrupt overhead; here it is:
Cycle delay by CPU Branch Trap
1.6 GHz Pentium 4 149 1408
AMD Athalon 1800 38 361
1.6 GHz Pentium M 84 541
These numbers are from the kerninst team from the University of
Wisconsin
and I did not verify them myself. In general it looks like a trap is
7-10x
more expensive than a branch. It appears to me that kprobes requires
three
traps, so that would make the overall impact 20-30x more expensive.
For Example: Assume a 1.6GHz Pentium 4
Branch overhead: 149 cycles
Overhead for one trap: about 1400 cycles
Kprobes requires 2-3 traps
1% overhead => 16M cycles
trap-based instrumentation: 5000 probes per second
branch-based instrumentation: 94000 probes per second
For many tools, most time will be spent in analysis code and this
issue is irrelevant. However, if you happen to be a performance
guy, and you're trying to do something even moderately aggressive
in terms of higher frequency or very low overhead, this might start
to matter. If this also helps to simplify some of the interrupt
management issues, that's great.
I note in passing that the SPARC implementation of DTrace is
reported to use branches, and their x86 implementation uses
traps.
Brad Chen