This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: [PATCH tip/master] [BUGFIX] kprobes/x86: Fix to clear TF bit in fault-on-single-stepping
- From: Steven Rostedt <rostedt at goodmis dot org>
- To: Masami Hiramatsu <mhiramat at kernel dot org>
- Cc: Ingo Molnar <mingo at redhat dot com>, linux-kernel at vger dot kernel dot org, Peter Zijlstra <peterz at infradead dot org>, Ananth N Mavinakayanahalli <ananth at linux dot vnet dot ibm dot com>, Thomas Gleixner <tglx at linutronix dot de>, "H . Peter Anvin" <hpa at zytor dot com>, Andy Lutomirski <luto at kernel dot org>, systemtap at sourceware dot org, Linus Torvalds <torvalds at linux-foundation dot org>, fenghua dot yu at intel dot com
- Date: Mon, 13 Jun 2016 19:13:45 -0400
- Subject: Re: [PATCH tip/master] [BUGFIX] kprobes/x86: Fix to clear TF bit in fault-on-single-stepping
- Authentication-results: sourceware.org; auth=none
- References: <20160611140648 dot 25885 dot 37482 dot stgit at devbox>
On Sat, 11 Jun 2016 23:06:53 +0900
Masami Hiramatsu <mhiramat@kernel.org> wrote:
> Fix kprobe_fault_handler to clear TF (trap flag) bit of
> flags register in the case of fault fixup on single-stepping.
>
> If we put a kprobe on the instruction which can cause a
> page fault (e.g. actual mov instructions in copy_user_*),
> that fault happens on a single-stepping buffer. In this
> case, kprobes resets running instance so that the CPU can
> retry execution on the original ip address.
> However, current code forgets reset TF bit. Since this
> fault happens with TF bit set for enabling single-stepping,
> when it retries, it causes a debug exception and kprobes
> can not handle it because it already reset itself.
>
> On the most of x86-64 platform, it can be easily reproduced
> by using kprobe tracer. E.g.
>
> # cd /sys/kernel/debug/tracing
> # echo p copy_user_enhanced_fast_string+5 > kprobe_events
> # echo 1 > events/kprobes/enable
>
> And you'll see a kernel panic on do_debug(), since the debug
> trap is not handled by kprobes.
>
> To fix this problem, we just need to clear the TF bit when
> resetting running kprobe.
>
This should definitely be marked for stable, and I bisected it all the
way down to this commit: f4cb1cc18f364d "x86-64, copy_user: Remove zero
byte check before copy user buffer."
I reverted that commit and sure enough, this bug goes away. I'm not
saying the revert should be done. I'm just doing an FYI, and showing how
changes that appear to be a nice clean up can have subtle effects. I'm
not even sure how that change caused this to be a problem with kprobes.
The proper fix is this patch.
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Please add:
Cc: stable@vger.kernel.org # v3.14+
-- Steve
> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
> ---
> arch/x86/kernel/kprobes/core.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
> index 38cf7a7..856df81 100644
> --- a/arch/x86/kernel/kprobes/core.c
> +++ b/arch/x86/kernel/kprobes/core.c
> @@ -961,6 +961,13 @@ int kprobe_fault_handler(struct pt_regs *regs, int trapnr)
> * normal page fault.
> */
> regs->ip = (unsigned long)cur->addr;
> + /*
> + * Trap flag has been set here because this fault happened
> + * where the single stepping will be done. So clear it with
> + * resetting current kprobe.
> + */
> + regs->flags &= ~X86_EFLAGS_TF;
> + /* If the TF was set before the kprobe hit, don't touch it */
> regs->flags |= kcb->kprobe_old_flags;
> if (kcb->kprobe_status == KPROBE_REENTER)
> restore_previous_kprobe(kcb);