This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: exercising current aarch64 kprobe support with systemtap
- From: Pratyush Anand <panand at redhat dot com>
- To: William Cohen <wcohen at redhat dot com>
- Cc: David Long <dave dot long at linaro dot org>, systemtap at sourceware dot org, Mark Brown <broonie at linaro dot org>, Jeremy Linton <jlinton at redhat dot com>, David Smith <dsmith at redhat dot com>
- Date: Mon, 27 Jun 2016 19:48:40 +0530
- Subject: Re: exercising current aarch64 kprobe support with systemtap
- Authentication-results: sourceware.org; auth=none
- References: <befacf57-b8eb-2926-8f4f-742f0f055a4c at redhat dot com> <d03cd7b7-3d6a-4e4d-71b6-a7325ddd76f3 at redhat dot com> <8f40d0b9-5550-92f9-d1c5-8769f52304c0 at redhat dot com> <576B5501 dot 1030106 at linaro dot org> <e5f466f7-4a60-efb3-d104-375807d271b2 at redhat dot com> <576C29E1 dot 8060805 at linaro dot org> <0a594132-796b-779d-b473-a06c0f3e8ae8 at redhat dot com>
Hi Will,
On 23/06/2016:03:22:44 PM, William Cohen wrote:
> On 06/23/2016 02:26 PM, David Long wrote:
> > On 06/23/2016 11:49 AM, William Cohen wrote:
> >> On 06/22/2016 11:18 PM, David Long wrote:
> >>> On 06/22/2016 04:24 PM, William Cohen wrote:
> >>>> Hi all,
> >>>>
> >>>> When running the current systemtap checked out from the git repository
> >>>> and a locally built kernel with the kprobes64-v13 patches (the
> >>>> test_upstream_arm64_devel branch of
> >>>> https://github.com/pratyushanand/linux) on Fedora 23 machine one of
> >>>> the kprobes_onthefly.exp tests is causing the machine to get in a
> >>>> state that requires rebooting to fix. This can be triggered by running a
> >>>> portion of the systemtap tests with:
> >>>>
> >>>> make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
> >>>>
> >>>> When it gets to the kprobes_onthefly - otf_stress_max_iter_5000 test the
> >>>> console starts spewing the following and needs to be rebooted:
> >>>>
> >>>> [23394.036860] Unexpected kernel single-step exception at EL1
> >>>> [23394.042434] Unexpected kernel single-step exception at EL1
> >>>> [23394.048008] Unexpected kernel single-step exception at EL1
> >>>> [23394.053541] Unexpected kernel single-step exception at EL1
> >>>> [23394.059053] Unexpected kernel single-step exception at EL1
> >>>> [23394.064545] Unexpected kernel single-step exception at EL1
> >>>>
> >>>> Sorry I don't have the start of the failure it scrolled off the screen very quickly.
> >>>>
> >>>> -Will
> >>>>
> >>>>
> >>>
> >>> I'll take a look and see what I can figure out.
> >>>
> >>> In the meantime I did just push a v14 branch. I'm doubtful that it will address the above problem even though it contains a few bug fixes.
> >>>
> >>> -dl
> >>>
> >>
> >> Hi Dave and Pratyush,
> >>
> >> I tried the kprobes64-v13 kernel and it also seems to work, so it lookw like the problem might be in the the
> >> test_upstream_arm64_devel branch of https://github.com/pratyushanand/linux .
> >>
> >> -Will
> >>
> >
> > I'm going to interpret that as meaning you know of no problem in the kprobes v14 patch that would give me pause to email it upstream. Do you disagree?
> >
> > -dl
> >
>
> Hi Dave,
>
> Yes, the problem only seems to be in that other kernel from https://github.com/pratyushanand/linux with the kprobe and uprobe patches, so the arm64 patches do not appear to be the problem. I don't know what is causing the problem maybe there is something going on with the porting of the patches to that kernel or the patches included in there (uprobes/kexec) in there.
Just to update:
I confirm that problem arises after uprobe patches only, but not yet sure that
actual culprit is uprobe code.
I can see that kprobes_onthefly.exp also exercises uprobes in the test. It
seems, when problem happens, there was a kprobe at print_worker_info().
Most likely re-entrant kprobe is called when kprobe is instrumented at
print_worker_info(). I guessed it could be show_regs() from arm64/kprobe code,
but commenting show_regs() did not make any difference. Even blacklisting
print_worker_info() also did not resolve it, probelem reproduced in a different
way after blacklisting.
So, still its vague and debugging is continued.
If I can clearly understand the systemtap test code, then probably it will be
easier to debug. I mean, if I can get the kernel and user space symbols name
where this test is instrumenting probes then that would help a lot to zero it
down.
~Pratyush