This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: [PATCH] Linux Kernel Markers
- From: Karim Yaghmour <karim at opersys dot com>
- To: Martin Bligh <mbligh at google dot com>
- Cc: "Frank Ch. Eigler" <fche at redhat dot com>, Masami Hiramatsu <masami dot hiramatsu dot pt at hitachi dot com>, prasanna at in dot ibm dot com, Andrew Morton <akpm at osdl dot org>, Ingo Molnar <mingo at elte dot hu>, Mathieu Desnoyers <mathieu dot desnoyers at polymtl dot ca>, Paul Mundt <lethal at linux-sh dot org>, linux-kernel <linux-kernel at vger dot kernel dot org>, Jes Sorensen <jes at sgi dot com>, Tom Zanussi <zanussi at us dot ibm dot com>, Richard J Moore <richardj_moore at uk dot ibm dot com>, Michel Dagenais <michel dot dagenais at polymtl dot ca>, Christoph Hellwig <hch at infradead dot org>, Greg Kroah-Hartman <gregkh at suse dot de>, Thomas Gleixner <tglx at linutronix dot de>, William Cohen <wcohen at redhat dot com>, ltt-dev at shafik dot org, systemtap at sources dot redhat dot com, Alan Cox <alan at lxorguk dot ukuu dot org dot uk>
- Date: Wed, 20 Sep 2006 14:50:11 -0400
- Subject: Re: [PATCH] Linux Kernel Markers
- Organization: Opersys inc.
- References: <4510151B.5070304@google.com> <20060919093935.4ddcefc3.akpm@osdl.org> <45101DBA.7000901@google.com> <20060919063821.GB23836@in.ibm.com> <45102641.7000101@google.com> <20060919070516.GD23836@in.ibm.com> <451030A6.6040801@google.com> <45105B5E.9080107@opersys.com> <451141B1.40803@hitachi.com> <451178B0.9030205@opersys.com> <20060920180808.GI18646@redhat.com> <451186F2.3060702@google.com>
- Reply-to: karim at opersys dot com
Martin Bligh wrote:
> It's looking to me like it might still need djprobes to implement, in
> order to get the atomic and safe switchover from the original function
> into the traced one. All rather sad, but seems to be true from all the
> CPU errata, etc. If anyone can see a way round that, I'd love to hear
> it.
But we don't need to fight the errata, there are fortunately solutions
that take care of it where it does exist (x86: djprobes/kprobes.)
What's more interesting, though, is that the method as it is proposed
at this stage *seems* to be easily portable to other archs. And where
such binary trickery is difficult to pull off, nothing precludes
having a universally "portable" mechanism including something akin to
switching between instrumented vs. normal function at function entry.
Even such conditional ifs can be optimized by the CPU nowadays.
The picture is, nevertheless, very bright at the moment (I think).
Just have a 5byte filler at function entry such as Hiramatsu-san
suggested, and use djprobes to fork to instrumented function. The
unconditional jump in the filler will most likely be utterly
unmeasurable, and benchmarks should confirm this.
So:
On x86: use 5byte filler and djprobes.
On "sane" archs: use filler and override as explained earlier.
Elsewhere: use standard "if" or function pointer at function entry.
> What it would give you above and beyond djprobes is an easier and more
> flexible way to actually do the instrumentation itself.
Absolutely agree.
Karim