This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: [PATCH -tip v5 07/10] kprobes/x86: Support kprobes jump optimization on x86
- From: Masami Hiramatsu <mhiramat at redhat dot com>
- To: Jason Baron <jbaron at redhat dot com>
- Cc: Frederic Weisbecker <fweisbec at gmail dot com>, Ingo Molnar <mingo at elte dot hu>, Ananth N Mavinakayanahalli <ananth at in dot ibm dot com>, lkml <linux-kernel at vger dot kernel dot org>, systemtap <systemtap at sources dot redhat dot com>, DLE <dle-develop at lists dot sourceforge dot net>, Jim Keniston <jkenisto at us dot ibm dot com>, Srikar Dronamraju <srikar at linux dot vnet dot ibm dot com>, Christoph Hellwig <hch at infradead dot org>, Steven Rostedt <rostedt at goodmis dot org>, "H. Peter Anvin" <hpa at zytor dot com>, Anders Kaseorg <andersk at ksplice dot com>, Tim Abbott <tabbott at ksplice dot com>, Andi Kleen <andi at firstfloor dot org>, Mathieu Desnoyers <mathieu dot desnoyers at polymtl dot ca>
- Date: Tue, 24 Nov 2009 12:46:32 -0500
- Subject: Re: [PATCH -tip v5 07/10] kprobes/x86: Support kprobes jump optimization on x86
- References: <20091123232115.22071.71558.stgit@dhcp-100-2-132.bos.redhat.com> <20091123232211.22071.58974.stgit@dhcp-100-2-132.bos.redhat.com> <20091124162708.GA29995@redhat.com>
Jason Baron wrote:
[...]
+/*
+ * Cross-modifying kernel text with stop_machine().
+ * This code originally comes from immediate value.
+ * This does _not_ protect against NMI and MCE. However,
+ * since kprobes can't probe NMI/MCE handler, it is OK for kprobes.
+ */
+static atomic_t stop_machine_first;
+static int wrote_text;
+
+struct text_poke_param {
+ void *addr;
+ const void *opcode;
+ size_t len;
+};
+
+static int __kprobes stop_machine_multibyte_poke(void *data)
+{
+ struct text_poke_param *tpp = data;
+
+ if (atomic_dec_and_test(&stop_machine_first)) {
+ text_poke(tpp->addr, tpp->opcode, tpp->len);
+ smp_wmb(); /* Make sure other cpus see that this has run */
+ wrote_text = 1;
+ } else {
+ while (!wrote_text)
+ smp_rmb();
+ sync_core();
+ }
+
+ flush_icache_range((unsigned long)tpp->addr,
+ (unsigned long)tpp->addr + tpp->len);
+ return 0;
+}
+
+static void *__kprobes __multibyte_poke(void *addr, const void *opcode,
+ size_t len)
+{
+ struct text_poke_param tpp;
+
+ tpp.addr = addr;
+ tpp.opcode = opcode;
+ tpp.len = len;
+ atomic_set(&stop_machine_first, 1);
+ wrote_text = 0;
+ stop_machine(stop_machine_multibyte_poke, (void *)&tpp, NULL);
+ return addr;
+}
As you know, I'd like to have the jump label optimization for
tracepoints, make use of this '__multibyte_poke()' interface. So perhaps
it can be moved to arch/x86/kernel/alternative.c. This is where 'text_poke()'
and friends currently live.
Hmm, maybe current text_poke needs to have singlebyte_poke() wrapper
for avoiding confusion.
Also, with multiple users we don't want to trample over each others code
patching. Thus, if each sub-system could register some type of
'is_reserved()' callback, and then we can call all these call backs from
the '__multibyte_poke()' routine before we do any patching to make sure
that we aren't trampling on each others code. After a successful
patching, each sub-system can update its reserved set of code as
appropriate. I can code a prototype here, if this makes sense.
Hmm, we have to implement it carefully, because here kprobes already
inserted int3 and optprobe rewrites the int3 again. If is_reserved()
returns 1 and multibyte_poke returns error, we can't optimize it anymore.
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhiramat@redhat.com