This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: exercising current aarch64 kprobe support with systemtap


Hi Will,

On 23/06/2016:03:22:44 PM, William Cohen wrote:
> On 06/23/2016 02:26 PM, David Long wrote:
> > On 06/23/2016 11:49 AM, William Cohen wrote:
> >> On 06/22/2016 11:18 PM, David Long wrote:
> >>> On 06/22/2016 04:24 PM, William Cohen wrote:
> >>>> Hi all,
> >>>>
> >>>> When running the current systemtap checked out from the git repository
> >>>> and a locally built kernel with the kprobes64-v13 patches (the
> >>>> test_upstream_arm64_devel branch of
> >>>> https://github.com/pratyushanand/linux) on Fedora 23 machine one of
> >>>> the kprobes_onthefly.exp tests is causing the machine to get in a
> >>>> state that requires rebooting to fix.  This can be triggered by running a
> >>>> portion of the systemtap tests with:
> >>>>
> >>>>    make installcheck RUNTESTFLAGS="--debug systemtap.onthefly/kprobes_onthefly.exp"
> >>>>
> >>>> When it gets to the kprobes_onthefly - otf_stress_max_iter_5000 test the
> >>>> console starts spewing the following and needs to be rebooted:
> >>>>
> >>>> [23394.036860] Unexpected kernel single-step exception at EL1
> >>>> [23394.042434] Unexpected kernel single-step exception at EL1
> >>>> [23394.048008] Unexpected kernel single-step exception at EL1
> >>>> [23394.053541] Unexpected kernel single-step exception at EL1
> >>>> [23394.059053] Unexpected kernel single-step exception at EL1
> >>>> [23394.064545] Unexpected kernel single-step exception at EL1
> >>>>
> >>>> Sorry I don't have the start of the failure it scrolled off the screen very quickly.
> >>>>
> >>>> -Will
> >>>>
> >>>>
> >>>
> >>> I'll take a look and see what I can figure out.
> >>>
> >>> In the meantime I did just push a v14 branch.  I'm doubtful that it will address the above problem even though it contains a few bug fixes.
> >>>
> >>> -dl
> >>>
> >>
> >> Hi Dave and Pratyush,
> >>
> >> I tried the kprobes64-v13 kernel and it also seems to work, so it lookw like the problem might be in the the
> >> test_upstream_arm64_devel branch of https://github.com/pratyushanand/linux .
> >>
> >> -Will
> >>
> > 
> > I'm going to interpret that as meaning you know of no problem in the kprobes v14 patch that would give me pause to email it upstream.  Do you disagree?
> > 
> > -dl
> > 
> 
> Hi Dave,
> 
> Yes, the problem only seems to be in that other kernel from https://github.com/pratyushanand/linux with the kprobe and uprobe patches, so the arm64 patches do not appear to be the problem.  I don't know what is causing the problem  maybe there is something going on with the porting of the patches to that kernel or the patches included in there (uprobes/kexec) in there. 

Just to update:

I confirm that problem arises after uprobe patches only, but not yet sure that
actual culprit is uprobe code. 

I can see that kprobes_onthefly.exp also exercises uprobes in the test. It
seems, when problem happens, there was a kprobe at print_worker_info(). 

Most likely re-entrant kprobe is called when kprobe is instrumented at
print_worker_info(). I guessed it could be show_regs() from arm64/kprobe code,
but commenting show_regs() did not make any difference. Even blacklisting
print_worker_info() also did not resolve it, probelem reproduced in a different
way after blacklisting.

So, still its vague and debugging is continued.
If I can clearly understand the systemtap test code, then probably it will be
easier to debug. I mean, if I can get the kernel and user space symbols name
where this test is instrumenting probes then that would help a lot to zero it
down.

~Pratyush


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]