This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Experiences with kprobes


Baruch Even wrote:
William Cohen wrote:

I wrote some simple tests to check the overhead of kprobes and jprobes. I have run them on an athlon and pentium III machine, but I haven't run them on an pentium IV. It could be the costs are higher on Pentium IV. Could you give these a try on your pentium IV machine? The following URL has an attachment with software for measuring overhead:

http://sources.redhat.com/ml/systemtap/current/msg00093.html


I'll try that and report back.

Also is an smp kernel or premption being used? The current locking mechanism in kprobes serializes multiple kprobes. Is it being possible that some of the overhead could be due to serialization of the probes?


Actually I'm using a UP kernel with no preemption due to limitations of oprofile, I need two profiling registers to check CPU utilization and memory accesses and so SMP is out (according to the OProfile developers).

I wasn't sure if the machine had multiple physical processors in it or not You mean can't run Hyper-Threading because it halves the number of performance registers? The HT and specialized performance monitoring counters on P4 are not a great combination.


The specifics for me is that the tests are running using dummynet network to simulate a very high speed long distance network (about 300ms rtt and 300Mbit/s bandwidth) so the packet rates are very high with BDP of about 8000 packets, i.e. lots of ack packets to process).


What kind of rate are the probes firing at? n*8000 probe firings per second? Could the delay introduced by the probes be affect behavior?


In the test with the probes I only get the cwnd up to about 3000, with a round trip of 300ms and delayed acking, it means 3000*1000/300/2 = 5000 packets per second. But the real problem is that when we really get to this level and we loss a packet we start handling SACKs which are sent for each packet and not for every two packets, so we get double the rate of ACKs which sends us to 10000 pps, at which stage the probes will probably affect behaviour significantly enough.

At that point I usually see a complete failure to handle the stream of packets and network throttling kicks in killing the connection even further.

10,000 pps * 10,000 cycles/sample = 100,000,000 cycles/second


This is still relatively small compared to 3GHz. However, delay could could still be pushing it over some critical threshold.


[0] As a grad student, at least part of the idea is to have fun :-)


Even if you are not a graduate student the previous line holds. :)


Still haven't found a work place that accepts that...

I didn't say all fun, but enjoying what you do works out better for all those involved.


-Will


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]