This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
reducing cost of user-space probes
- From: "O Mahony, Billy" <billy dot o dot mahony at intel dot com>
- To: "systemtap at sourceware dot org" <systemtap at sourceware dot org>
- Date: Mon, 24 Apr 2017 11:58:59 +0000
- Subject: reducing cost of user-space probes
- Authentication-results: sourceware.org; auth=none
- Dlp-product: dlpe-windows
- Dlp-reaction: no-action
- Dlp-version: 10.0.102.7
Hi,
I'm new to systemtap and I am using it to add some probes into a user space application.
The probe is pretty simple - it collects one integer argument and presents a histogram every 3 seconds.
The probe is working fine and I'm getting results that are sensible. The application is a packet processing application that is using a user space io library (DPDK) to read batches of network packets directly into user space. The probe is called about 750K times per second (I have 10Gb link with 64B packets which generates 14.8M packets per second - but the batch size (that's the stat I'm tracing) - is about 20 so 750K probe hits per sec.
When the probe is in use I see less performance from the packet processing application - it starts loosing packets at about 90% of it's non-probed throughput.
However, when I run stap I see:
> Pass 4: compiled C into "stap_13723.ko" in 9020usr/980sys/10638real ms
Does this mean that each time the probe is hit that a system call is made to this new .ko module? That would surely mean quite a lot of overhead. If this is correct, can this overhead be avoided for user space probes.
Alternatively is there a way to only execute the script every n times the probe is hit?
Maybe there is a compile time macro that does this or some .stap command that does an early return from the script X% of the time. I searched for 'sample/sampling' in the lang ref but I didn't see anything.
Thanks for any help you can give.
Billy