This is the mail archive of the
systemtap@sources.redhat.com
mailing list for the systemtap project.
Re: Experiences with kprobes
Baruch Even wrote:
William Cohen wrote:
I wrote some simple tests to check the overhead of kprobes and
jprobes. I have run them on an athlon and pentium III machine, but I
haven't run them on an pentium IV. It could be the costs are higher on
Pentium IV. Could you give these a try on your pentium IV machine? The
following URL has an attachment with software for measuring overhead:
http://sources.redhat.com/ml/systemtap/current/msg00093.html
I'll try that and report back.
Also is an smp kernel or premption being used? The current locking
mechanism in kprobes serializes multiple kprobes. Is it being possible
that some of the overhead could be due to serialization of the probes?
Actually I'm using a UP kernel with no preemption due to limitations of
oprofile, I need two profiling registers to check CPU utilization and
memory accesses and so SMP is out (according to the OProfile developers).
I wasn't sure if the machine had multiple physical processors in it or
not You mean can't run Hyper-Threading because it halves the number of
performance registers? The HT and specialized performance monitoring
counters on P4 are not a great combination.
The specifics for me is that the tests are running using dummynet
network to simulate a very high speed long distance network (about
300ms rtt and 300Mbit/s bandwidth) so the packet rates are very high
with BDP of about 8000 packets, i.e. lots of ack packets to process).
What kind of rate are the probes firing at? n*8000 probe firings per
second? Could the delay introduced by the probes be affect behavior?
In the test with the probes I only get the cwnd up to about 3000, with a
round trip of 300ms and delayed acking, it means 3000*1000/300/2 = 5000
packets per second. But the real problem is that when we really get to
this level and we loss a packet we start handling SACKs which are sent
for each packet and not for every two packets, so we get double the rate
of ACKs which sends us to 10000 pps, at which stage the probes will
probably affect behaviour significantly enough.
At that point I usually see a complete failure to handle the stream of
packets and network throttling kicks in killing the connection even
further.
10,000 pps * 10,000 cycles/sample = 100,000,000 cycles/second
This is still relatively small compared to 3GHz. However, delay could
could still be pushing it over some critical threshold.
[0] As a grad student, at least part of the idea is to have fun :-)
Even if you are not a graduate student the previous line holds. :)
Still haven't found a work place that accepts that...
I didn't say all fun, but enjoying what you do works out better for all
those involved.
-Will