This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: exercising current aarch64 kprobe support with systemtap


On 07/04/2016 08:46 AM, Pratyush Anand wrote:
Hi Will,

I did some more debugging, and this is what my understanding is:

- While executing this test page_counter_cancel() is called. Probably
there is an out of memory scenario.
- page_counter_cancel() calls WARN_ON_ONCE(new < 0);
- WARN_ON_ONCE() causes to invoke brk BUG_BRK_IMM (brk 0x800) instruction
- Execution of brk 0x800 invokes calling of bug_handler()
- bug_handler() calls report_bug() which calls __warn()
- __warn() does lot of pr_warn()  which invokes print_worker_info()
where we have a kprobe instrumented.
- Therefore, we are encountering this issue.


~Pratyush


It sounds like the only fix would be to expand the blacklist to any function that could be called in a debug exception-handling context? I have to think by the time this (fluid) list of functions were compiled there would be an awful lot of unprobeable code. Do we think there is any reasonable approach to making this less likely to happen when using kprobes, without extensive blacklisting?

I pushed a v15 branch to my repo last night and I'd like to email the patches out ASAP if we think this issue is either acceptable, or best addressed after the feature is in place.




On Tue, Jun 28, 2016 at 8:50 AM, William Cohen <wcohen@redhat.com> wrote:
On 06/27/2016 10:18 AM, Pratyush Anand wrote:
Hi Will,

On 23/06/2016:03:22:44 PM, William Cohen wrote:
On 06/23/2016 02:26 PM, David Long wrote:
On 06/23/2016 11:49 AM, William Cohen wrote:
On 06/22/2016 11:18 PM, David Long wrote:
On 06/22/2016 04:24 PM, William Cohen wrote:
Hi all,

When running the current systemtap checked out from the git repository
and a locally built kernel with the kprobes64-v13 patches (the
test_upstream_arm64_devel branch of
https://github.com/pratyushanand/linux) on Fedora 23 machine one of
the kprobes_onthefly.exp tests is causing the machine to get in a
state that requires rebooting to fix.  This can be triggered by running a
portion of the systemtap tests with:

    make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"

When it gets to the kprobes_onthefly - otf_stress_max_iter_5000 test the
console starts spewing the following and needs to be rebooted:

[23394.036860] Unexpected kernel single-step exception at EL1
[23394.042434] Unexpected kernel single-step exception at EL1
[23394.048008] Unexpected kernel single-step exception at EL1
[23394.053541] Unexpected kernel single-step exception at EL1
[23394.059053] Unexpected kernel single-step exception at EL1
[23394.064545] Unexpected kernel single-step exception at EL1

Sorry I don't have the start of the failure it scrolled off the screen very quickly.

-Will



I'll take a look and see what I can figure out.

In the meantime I did just push a v14 branch.  I'm doubtful that it will address the above problem even though it contains a few bug fixes.

-dl


Hi Dave and Pratyush,

I tried the kprobes64-v13 kernel and it also seems to work, so it lookw like the problem might be in the the
test_upstream_arm64_devel branch of https://github.com/pratyushanand/linux .

-Will


I'm going to interpret that as meaning you know of no problem in the kprobes v14 patch that would give me pause to email it upstream.  Do you disagree?

-dl


Hi Dave,

Yes, the problem only seems to be in that other kernel from https://github.com/pratyushanand/linux with the kprobe and uprobe patches, so the arm64 patches do not appear to be the problem.  I don't know what is causing the problem  maybe there is something going on with the porting of the patches to that kernel or the patches included in there (uprobes/kexec) in there.

Just to update:

I confirm that problem arises after uprobe patches only, but not yet sure that
actual culprit is uprobe code.

I can see that kprobes_onthefly.exp also exercises uprobes in the test. It
seems, when problem happens, there was a kprobe at print_worker_info().

Most likely re-entrant kprobe is called when kprobe is instrumented at
print_worker_info(). I guessed it could be show_regs() from arm64/kprobe code,
but commenting show_regs() did not make any difference. Even blacklisting
print_worker_info() also did not resolve it, probelem reproduced in a different
way after blacklisting.

So, still its vague and debugging is continued.
If I can clearly understand the systemtap test code, then probably it will be
easier to debug. I mean, if I can get the kernel and user space symbols name
where this test is instrumenting probes then that would help a lot to zero it
down.

~Pratyush


Hi Pratyush,

My understanding is that the systemtap onthefly support enables/disable the probe as metnioned in the following sytemtap bugzilla entry (and the ones that it is dependent on): https://sourceware.org/bugzilla/show_bug.cgi?id=10995.  It would be handy to things pared down to the systemtap script that triggers the problem.  Putting some diagnostic puts it looks like the script that triggers the problems it looks like it is something like the attached onthefly_trigger.stp (that was gathered on a x86_64 machine so it might not be exactly what is causing the problem on aarch64.  David Smith, any suggestions on how to debug based on your experiences from https://sourceware.org/bugzilla/show_bug.cgi?id=17126 where the ppc64 had a similar issue with onthefly testing?

The "Unexpected kernel single-step exception at EL1" reminds me of the times when kprobes couldn't find a handler.  Maybe there is some situation where the kprobe is being removed but the breakpoint is still around. Did you get a backtrace with the insertino of the "BUG()" where that message is printed out? I wonder if it might be triggered by the (thread_flags & _TIF_UPROBE) somehow being true and the aarch64 do_notify_resume starts running.

-Will

-dl


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]