This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: Rapidly running systemtap causing hangs or oops
Hi, Richard -
rjones wrote:
> [...]
>> Can you try running stap with "-D STP_ALIBI"? This alibi mode compiles
>> out most of stap's code, so each probe handler is reduced to just an
>> atomic increment, then a final hit count is reported on exit.
> Adding -D STP_ALIBI [...] did not change the behaviour. The mount
> process crashed quickly with the oops below:
> [ 159.454020] [<ffffffffa00d0a3b>] ext2_fill_super+0x9b5/0xc3b [ext2]
> [ 159.454020] [<ffffffff8113a0df>] mount_bdev+0x155/0x1b7
> [ 159.454020] [<ffffffffa00d0086>] ? ext2_error+0x112/0x112 [ext2]
> [...]
OK, that does seem to implicate the kernel or our registration /
unregistration process. Telling which is a bit tricky because the
kernel's own 'perf probe' widget cannot register/unregister as many
probes as quickly as we can, which means that if the kernel has race
conditions in all that text-segment manipulation, we are more likely
to hit it than e.g. perf. Such has happened before, and it's tough to
diagnose.
An intermediate option is to extract all the kprobe addresses from the
"stap -p2" processing loop, and modify systemtap source-tree
scripts/kprobes_test/gen_code.py to take a symbol+offset list rather
than just a symbol list, to generate a non-systemtap pure-kprobes
module. Then one could insmod;test;rmmod in a tight loop to see if
the same problem reappears. At that point, one punts to the kernel
folks.
Another hacky intermediate possibility is to put some deliberate
time delays here and there, like between your while true; do stap; done
loop iterations. Or disable runtime/autoconf-unregister-kprobes.c, so
stap doesn't use the kernel bulk-unregistration functions but rather
goes one by one.
- FChE