This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: kernel function probe overhead


On Tue, Nov 29, 2016 at 3:19 AM, Frank Ch. Eigler <fche@redhat.com> wrote:
>
>
> aubreylee wrote:
>
> >> A couple of things come to mind.  Use of gettimeofday_ns() function uses
> >> a software-estimated real-time clock, which may not be steady enough for
> >> high-grained measurements like this
> >
> > Timestamp function: gettimeofday_ns()
> > [...]
> > Timestamp function: get_cycles()
> > [...]
> > Timestamp function: local_clock_ns()
> > [...]
>
> Thanks for testing!  The distribution of those three seem remarkably
> similar - a nice central peak, and a second small peak at 4-8x the
> first.  It suggests we can probably rule out clock problems.
>
>
> >>  The average is pretty low.  Consider printing a
> >> histogram of t_end, or (if using git systemtap) its @variance.
> >>
> >> Another possibility is contention over the t_start[] variable, if
> >> multiple processors are running this region of code, so trigger the
> >> systemtap probes concurrently.  systemtap automatically locks such
> >> global variables, esp. if they contain non-aggregate values like the
> >> scalar timestamps.
> >
> > Do we have an approach to probe the function on CPU0 only?
>
> Not directly; the kernel does not give us such control.  The probe could
> include a
>
>    if (cpu() != 0) next;
>
> at the top of the probe handler body.  That way locks still are taken,
> but released quickly on the other CPUs.
>

Thanks for the suggestion.

Original result:
t_end @count=4800 @min=740 @max=300281 @sum=90589877 @avg=18872

Add if(cpu()!=0) next in the probe handler:
t_end @count=161 @min=653 @max=114856 @sum=1780522 @avg=11059

And if I add "maxcpus=1" to the kernel cmdline:
t_end @count=1190 @min=573 @max=14529 @sum=1406437 @avg=1181

The last one is much better, so it looks like we have a scalability
opportunity here.

> Another solution would be to have per-thread global variables.  Since
> the kernel doesn't help us out with this, we'd have to emulate it
> somehow, and for this to be wortwhile, it'd have to work faster than
> ordinary locked/shared array[tid()] accesses.
>

Is this a suggestion for me? How to do this?

>
> >> Consider using "stap -t FOO.stp" to have the tool print internal
> >> estimates of its own probes' runtimes.  That would include lock
> >> contention, but exclude kernel-side overheads like the code involved in
> >> the its kprobes machinery.
> >
> > "-t" option output.
>
> Ah, I note that we don't track occurrences of lock contention,
> even though we could.  Opened https://sourceware.org/bugzilla/show_bug.cgi?id=20879

okay, thanks, I'll keep an eye on this ticket.

>
> > [...]
> > kernel.function("arch_cpu_idle_enter@arch/x86/kernel/process.c:277").call,
> > (test2.stp:13:1), hits: 2801, cycles: 560min/34290avg/230611max, from:
> > [...]
> > kernel.function("local_touch_nmi@arch/x86/kernel/nmi.c:561").call,
> > (test2.stp:18:1), hits: 2801, cycles: 396min/5118avg/160096max, from:
> > [...]
>
> That inconsistency dramatic contention or caching or some other source
> of jitter.  Having the variance figure in there would be useful though.
> Would anyone like to give that a try?  (translate.cxx line 2283ish and
> runtime/linux/common_session_state.h _stp_stat_init area).
>

If there is a patch delivered against git head, I can apply it and
paste the output here.

Thanks,
-Aubrey

> - FChE


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]