This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: Some newbie questions
On 08/11/2016 10:51 AM, Avi Kivity wrote:
On 08/10/2016 07:47 PM, Frank Ch. Eigler wrote:
Hi -
On Wed, Aug 10, 2016 at 06:40:02PM +0300, Avi Kivity wrote:
[...]
Yes. The problem is that if the function is called often (with a usual
short running time), then systemtap will eat all of the cpu time
spinning on an internal lock.
Well, not just that ... trapping each function entry/exit has
unavoidable kernel uprobes context-switchy-type overheads.
Like you say, those are unavoidable. But at least those costs are
handled by scalable resources.
Your perf
report may well be misattributing the cost.
I think it's unlikely. When perf says __raw_spin_lock is guilty, it
usually is.
(Have you tried a stap
script that merely traps all the same the function calls, and has
empty probe handlers?)
I can try it.
It was actually pretty bad with empty handlers:
#
66.25% scylla [kernel.kallsyms] [k]
_raw_spin_lock
|
---_raw_spin_lock
|
|--49.95%-- 0x62ab
| syscall_trace_leave
| int_check_syscall_exit_work
| |
| |--99.08%-- epoll_pwait
| |
_ZN21reactor_backend_epoll16wait_and_processEiPK10__sigset_t
| |
_ZNSt17_Function_handlerIFbvEZN7reactor3runEvEUlvE5_E9_M_invokeERKSt9_Any_data
| | _ZN7reactor3runEv
| | |
| | |--91.80%--
_ZZN3smp9configureEN5boost15program_options13variables_mapEENKUlvE1_clEv.constprop.2783
| | | _Z19dpdk_thread_adaptorPv
| | | eal_thread_loop
| | | start_thread
| | | __clone
| | |
| | --8.20%--
_ZN12app_template14run_deprecatedEiPPcOSt8functionIFvvEE
| | |
| | --100.00%-- main
| | __libc_start_main
| | _start
| --0.92%-- [...]
|
|--49.46%-- 0x619b
| syscall_trace_enter_phase2
| tracesys_phase2
| |
| |--98.82%-- epoll_pwait
| |
_ZN21reactor_backend_epoll16wait_and_processEiPK10__sigset_t
| |
_ZNSt17_Function_handlerIFbvEZN7reactor3runEvEUlvE5_E9_M_invokeERKSt9_Any_data
| | _ZN7reactor3runEv
| | |
| | |--91.94%--
_ZZN3smp9configureEN5boost15program_options13variables_mapEENKUlvE1_clEv.constprop.2783
| | | _Z19dpdk_thread_adaptorPv
| | | eal_thread_loop
| | | start_thread
| | | __clone
| | |
| | --8.06%--
_ZN12app_template14run_deprecatedEiPPcOSt8functionIFvvEE
| | |
| | --100.00%-- main
| | __libc_start_main
| | _start
| --1.18%-- [...]
--0.58%-- [...]
I don't have any system call probes. Just two empty static probes, and
a timer.profile handler.
Note though that such analysis probably cannot be performed based only
upon PC samples - or even backtrace samples. We seem to require
trapping individual function entry/exit events.
That's why I tried systemtap. It worked well on my desktop, but very
badly in production.
It may be worth experimenting with "stap --runtime=dyninst" if your
function analysis were restricted to basic Cish userspace that dyninst
can handle.
Will timer.profile work with dyninst?