This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: How to get the detailed kernel stack trace if specified fuction takes too long to finish?
- From: liang xie <xieliang007 at gmail dot com>
- To: Josh Stone <jistone at redhat dot com>
- Cc: systemtap at sourceware dot org
- Date: Wed, 2 Apr 2014 18:25:25 +0800
- Subject: Re: How to get the detailed kernel stack trace if specified fuction takes too long to finish?
- Authentication-results: sourceware.org; auth=none
- References: <CADu=CFo=AtrBJLacPTa_6EfQA7vao=bSuT2Yns45x-hWYfmWZA at mail dot gmail dot com> <5335A3A6 dot 8010300 at redhat dot com>
I tried the last scheduler script, the output just like this:
# stap --all-modules y.stp
WARNING: missing unwind/symbol data for module 'uprobes'
long sys_write in tid 10162
Systemtap probe: kernel.trace("sched_switch")
long sys_write in tid 11510
Systemtap probe: kernel.trace("sched_switch")
long sys_write in tid 11288
Systemtap probe: kernel.trace("sched_switch")
I could not figure out why "print_backtrace()" didn't output any log:)
On Sat, Mar 29, 2014 at 12:30 AM, Josh Stone <jistone@redhat.com> wrote:
> On 03/27/2014 09:22 PM, liang xie wrote:
>> Hi all,
>>
>> Probably it's a newbie question, to be honest, i am new to SystemTap,
>
> No problem, welcome!
>
>> currently i want to capture the kernel stack trace if "sys_write" runs
>> too long, if that i could know where it sucks.
>>
>> My OS:2.6.32-220.el6.x86_64
>> # stap -V
>> Systemtap translator/driver (version 1.6/0.152 non-git sources)
>>
>> My tried script:
>> global i=0
>> global tb=0
>> global te=0
>> probe kernel.function("sys_write") {
>> tb=gettimeofday_us();
>> }
>> probe kernel.function("sys_write").return {
>> te=gettimeofday_us();
>> if(te-tb > 100000) {
>> i++;
>> printf("time_us: %d\n", te-tb);
>> print_backtrace();
>> if(i > 15) {
>> exit();
>> }
>> }
>> }
>>
>> # stap -x <pid> xxx.stp
>> time_us: 226795
>> Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel]
>> Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel]
>> time_us: 101400
>> Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel]
>> Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel]
>> time_us: 527853
>> Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel]
>> Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel]
>> time_us: 133948
>> Returning from: 0xffffffff81176f90 : sys_write+0x0/0x90 [kernel]
>> Returning to : 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b [kernel]
>>
>> It doesn't show my "expected" kernel stack trace,e.g. file system or
>> block layer related stack trace. How to archive that? Any pointers
>> will be highly appreciated!
>
> Right, since you waited until return to print_backtrace, it will only be
> from the end of sys_write. (Though it appears to be the beginning,
> sys_write+0x0, because of the way a kretprobe captures returns.)
>
> Since you want info from deeper kernel internals, you'll need additional
> probes. One option is to use the para-callgraph example script, where
> sys_write is your first argument to start a trace, and the second
> argument gives a broad subsystem to dig through. Like:
>
> $ stap para-callgraph.stp 'kernel.function("sys_write")' \
> 'kernel.function("*@fs/*")'
>
> This script will show you timestamps, but you'll get all output whether
> it's slow or not. Maybe you can modify it to filter some of the
> information.
>
> Another option is to use timer.profile, and then you can watch for your
> long waits. Something like:
>
> global start_time
> probe syscall.write { start_time[tid()] = gettimeofday_us() }
> probe syscall.write.return { delete start_time[tid()] }
> probe timer.profile {
> t = tid()
> if (t in start_time && gettimeofday_us() - start_time[t] > 100000)
> {
> printf("long sys_write in tid %d\n", t)
> print_backtrace()
> delete start_time[t]
> }
> }
>
> Even this assumes that the long wait is busy on the cpu, but I think
> most likely it's an IO wait, which means you'll be scheduled off while
> it completes. So then instead of timer.profile, or in addition, you
> could use a scheduler tracepoint. Like:
>
> global start_time
> probe syscall.write { start_time[tid()] = gettimeofday_us() }
> probe syscall.write.return { delete start_time[tid()] }
> probe kernel.trace("sched_switch") {
> t = task_tid($next)
> if (t in start_time && gettimeofday_us() - start_time[t] > 100000)
> {
> printf("long sys_write in tid %d\n", t)
> print_backtrace()
> delete start_time[t]
> }
> }
>
> So as your thread is scheduled back onto the cpu, if it has been a long
> time, the backtrace will tell you what path it took to get scheduled off
> in the first place.
>
>
> Hope that's enough to give you some direction...
>
> Josh