This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: [PATCH] Linux Kernel Markers
- From: "S. P. Prasanna" <prasanna at in dot ibm dot com>
- To: Martin Bligh <mbligh at google dot com>
- Cc: Andrew Morton <akpm at osdl dot org>, "Frank Ch. Eigler" <fche at redhat dot com>, Ingo Molnar <mingo at elte dot hu>, Mathieu Desnoyers <mathieu dot desnoyers at polymtl dot ca>, Paul Mundt <lethal at linux-sh dot org>, linux-kernel <linux-kernel at vger dot kernel dot org>, Jes Sorensen <jes at sgi dot com>, Tom Zanussi <zanussi at us dot ibm dot com>, Richard J Moore <richardj_moore at uk dot ibm dot com>, Michel Dagenais <michel dot dagenais at polymtl dot ca>, Christoph Hellwig <hch at infradead dot org>, Greg Kroah-Hartman <gregkh at suse dot de>, Thomas Gleixner <tglx at linutronix dot de>, William Cohen <wcohen at redhat dot com>, ltt-dev at shafik dot org, systemtap at sources dot redhat dot com, Alan Cox <alan at lxorguk dot ukuu dot org dot uk>
- Date: Tue, 19 Sep 2006 12:35:16 +0530
- Subject: Re: [PATCH] Linux Kernel Markers
- References: <20060918234502.GA197@Krystal> <20060919081124.GA30394@elte.hu> <451008AC.6030006@google.com> <20060919154612.GU3951@redhat.com> <4510151B.5070304@google.com> <20060919093935.4ddcefc3.akpm@osdl.org> <45101DBA.7000901@google.com> <20060919063821.GB23836@in.ibm.com> <45102641.7000101@google.com>
- Reply-to: prasanna at in dot ibm dot com
On Tue, Sep 19, 2006 at 10:17:53AM -0700, Martin Bligh wrote:
> >>>>It seems like all we'd need to do
> >>>>is "list all references to function, freeze kernel, update all
> >>>>references, continue"
> >>>
> >>>
> >>>"overwrite first 5 bytes of old function with `jmp new_function'".
> >>
> >>Yes, that's simple. but slower, as you have a double jump. Probably
> >>a damned sight faster than int3 though.
> >
> >
> >The advantage of using int3 over jmp to launch the instrumented
> >module is that int3 (or breakpoint in most architectures) is an
> >atomic operation to insert.
>
> Ah, good point. Though ... how much do we care what the speed of
> insertion/removal actually is? If we can tolerate it being slow,
> then just sync everyone up in an IPI to freeze them out whilst
> doing the insert.
>
I guess using IPI occasionally would be acceptable. But I think
using IPI for each probes will lots of overhead.
>
> Surely this still carries the overhead of doing the breakpoint,
> which was part of what we were trying to get away from? I suppose
> we get more flexibility this way. Or does the slowness not actually
> come from the int3, but only the single-stepping?
Yes, it comes from int3 as well.
>
> How about we combine all three ideas together ...
>
> 1. Load modified copy of the function in question.
> 2. overwrite the first instruction of the routine with an int3 that
> does what you say (atomically)
> 3. Then overwrite the second instruction with a jump that's faster
> 4. Now atomically overwrite the int3 with a nop, and let the jump
> take over.
>
That's a good solution.
Thanks
Prasanna
> >Adv:
> >Can be enabled/disabled dynamically by inserting/removing
> >breakpoints. No overhead of single stepping.
> >No restriction of running the handler in interrupt context.
> >You can have pre-compiled instrumented routines.
> >This mechanism can be used for pre-defined set of routines and for
> >arbiratory probe points, you can use kprobes/jprobes/systemtap.
> >No need to be super-user for predefined breakpoints.
> >
> >Dis:
> >Maintainence of the code, since it can code base need to be
> >duplicated and instrumented.
>
> CONFIG_FOO_BAR .... turn it on or off to turn on the instrumentation.
> compiled out by default. Compiled in when making the tracing functions.
>
> >The above idea is similar to runtime or dynamic patching, but here we
> >use int3(breakpoint) rather than jump instruction.
>
> Depends what we're trying to fix. I was trying to fix two things:
>
> 1. Flexibility - kprobes seem unable to access all local variables etc
> easily, and go anywhere inside the function. Plus keeping low overhead
> for doing things like keeping counters in a function (see previous
> example I mentioned for counting pages in shrink_list).
>
> 2. Overhead of the int3, which was allegedly 1000 cycles or so, though
> faster after Ingo had played with it, it's still significant.
>
> M.
--
Prasanna S.P.
Linux Technology Center
India Software Labs, IBM Bangalore
Email: prasanna@in.ibm.com
Ph: 91-80-41776329