This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
RE: reducing cost of user-space probes
- From: "O Mahony, Billy" <billy dot o dot mahony at intel dot com>
- To: Arkady <arkady dot miasnikov at gmail dot com>
- Cc: "systemtap at sourceware dot org" <systemtap at sourceware dot org>
- Date: Mon, 24 Apr 2017 13:07:29 +0000
- Subject: RE: reducing cost of user-space probes
- Authentication-results: sourceware.org; auth=none
- Dlp-product: dlpe-windows
- Dlp-reaction: no-action
- Dlp-version: 10.0.102.7
- References: <03135AEA779D444E90975C2703F148DC2F8B6BED@IRSMSX107.ger.corp.intel.com> <CANA-60o02W3VdZmhWfdoPj7aGk9vZiO8QW0cvDjz=Zq-+HMgJQ@mail.gmail.com>
Hi Arkady,
Thanks for the fast reply. Some great tips there.
Can you point be to a sample of using inline C? That sounds really interesting.
Thanks,
Billy.
> -----Original Message-----
> From: larytet@gmail.com [mailto:larytet@gmail.com] On Behalf Of Arkady
> Sent: Monday, April 24, 2017 1:17 PM
> To: O Mahony, Billy <billy.o.mahony@intel.com>
> Cc: systemtap@sourceware.org
> Subject: Re: reducing cost of user-space probes
>
> Hi,
>
> 8-10% performance hit when handling 0.5M-1M events/s is in line with what I
> experience. Some ways to improve the performance
>
> * find (or add) different probe point which is called less frequent
> * when running the STAP remove built in checks, for example --suppress-
> time-limits
> * examine the source code generated by stap (command line switch -k).
> there are things which are more expensive. For example, nesting in the STAP
> script, strings, associative arrays all come at some cost. I discovered that using
> inline C and array makes sense in some cases.
> You can access the array with /proc and process the data offline.
>
>
> On Mon, Apr 24, 2017 at 2:58 PM, O Mahony, Billy
> <billy.o.mahony@intel.com> wrote:
> > Hi,
> >
> > I'm new to systemtap and I am using it to add some probes into a user
> space application.
> >
> > The probe is pretty simple - it collects one integer argument and presents a
> histogram every 3 seconds.
> >
> > The probe is working fine and I'm getting results that are sensible. The
> application is a packet processing application that is using a user space io
> library (DPDK) to read batches of network packets directly into user space.
> The probe is called about 750K times per second (I have 10Gb link with 64B
> packets which generates 14.8M packets per second - but the batch size
> (that's the stat I'm tracing) - is about 20 so 750K probe hits per sec.
> >
> > When the probe is in use I see less performance from the packet
> processing application - it starts loosing packets at about 90% of it's non-
> probed throughput.
> >
> > However, when I run stap I see:
> >
> >> Pass 4: compiled C into "stap_13723.ko" in 9020usr/980sys/10638real
> >> ms
> >
> > Does this mean that each time the probe is hit that a system call is made to
> this new .ko module? That would surely mean quite a lot of overhead. If this
> is correct, can this overhead be avoided for user space probes.
> >
> > Alternatively is there a way to only execute the script every n times the
> probe is hit?
> >
> > Maybe there is a compile time macro that does this or some .stap
> command that does an early return from the script X% of the time. I
> searched for 'sample/sampling' in the lang ref but I didn't see anything.
> >
> > Thanks for any help you can give.
> >
> > Billy
> >