This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: reducing cost of user-space probes


Hi Arkady,

Thanks for the fast reply. Some great tips there.

Can you point be to a sample of using inline C? That sounds really interesting. 

Thanks,
Billy. 

> -----Original Message-----
> From: larytet@gmail.com [mailto:larytet@gmail.com] On Behalf Of Arkady
> Sent: Monday, April 24, 2017 1:17 PM
> To: O Mahony, Billy <billy.o.mahony@intel.com>
> Cc: systemtap@sourceware.org
> Subject: Re: reducing cost of user-space probes
> 
> Hi,
> 
> 8-10% performance hit when handling 0.5M-1M events/s is in line with what I
> experience. Some ways to improve the performance
> 
> * find (or add) different probe point which is called less frequent
> * when running the STAP remove built in checks, for example --suppress-
> time-limits
> * examine the source code generated by stap (command line switch -k).
> there are things which are more expensive. For example, nesting in the STAP
> script, strings, associative arrays all come at some cost. I discovered that using
> inline C and array makes sense in some cases.
> You can access the array with /proc and process the data offline.
> 
> 
> On Mon, Apr 24, 2017 at 2:58 PM, O Mahony, Billy
> <billy.o.mahony@intel.com> wrote:
> > Hi,
> >
> > I'm new to systemtap and I am using it to add some probes into a user
> space application.
> >
> > The probe is pretty simple - it collects one integer argument and presents a
> histogram every 3 seconds.
> >
> > The probe is working fine and I'm getting results that are sensible. The
> application is a packet processing application that is using a user space io
> library (DPDK) to read batches of network packets directly into user space.
> The probe is called about 750K times per second  (I have 10Gb link with 64B
> packets which generates 14.8M packets per second - but the batch size
> (that's the stat I'm tracing) - is about 20 so 750K probe hits per sec.
> >
> > When the probe is in use I see less performance from the packet
> processing application - it starts loosing packets at about 90% of it's non-
> probed throughput.
> >
> > However, when I run stap I see:
> >
> >> Pass 4: compiled C into "stap_13723.ko" in 9020usr/980sys/10638real
> >> ms
> >
> > Does this mean that each time the probe is hit that a system call is made to
> this new .ko module? That would surely mean quite a lot of overhead. If this
> is correct, can this overhead be avoided for user space probes.
> >
> > Alternatively is there a way to only execute the script every n times the
> probe is hit?
> >
> > Maybe there is a compile time macro that does this or some .stap
> command that does an early return from the script X% of the time. I
> searched for 'sample/sampling' in the lang ref but I didn't see anything.
> >
> > Thanks for any help you can give.
> >
> > Billy
> >

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]