This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 0/4] Linux Kernel Markers


Hi Richard,

* Richard J Moore (richardj_moore@uk.ibm.com) wrote:
> 
> 
> Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote on 20/12/2006
> 23:52:16:
> 
> > Hi,
> >
> > You will find, in the following posts, the latest revision of the Linux
> Kernel
> > Markers. Due to the need some tracing projects (LTTng, SystemTAP) has of
> this
> > kind of mechanism, it could be nice to consider it for mainstream
> inclusion.
> >
> > The following patches apply on 2.6.20-rc1-git7.
> >
> > Signed-off-by : Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> 
> Mathiue, FWIW I like this idea. A few years ago I implemented something
> similar, but that had no explicit clients. Consequently I made my hooks
> code more generalized than is needed in practice. I do remember that Karim
> reworked the LTT instrumentation to use hooks and it worked fine.
> 

Yes, I think some features you implemented in GKHI, like chained calls to
multiple probes, should be implemented in a "probe management module" which
would be built on top of the marker infrastructure. One of my goal is to
concentrate on having the core right so that, afterward, building on top of it
will be easy.

> You've got the same optimizations for x86 by modifying an instruction's
> immediate operand and thus avoiding a d-cache hit. The only real caveat is
> the need to avoid the unsynchronised cross modification erratum. Which
> means that all processors will need to issue a serializing operation before
> executing a Marker whose state is changed. How is that handled?
> 

Good catch. I thought that modifying only 1 byte would spare us from this
errata, but looking at it in detail tells me than it's not the case.

I see three different ways to address the problem :
1 - Adding some synchronization code in the marker and using
    synchronize_sched().
2 - Using an IPI to make other CPUs busy loop while we change the code and then
    execute a serializing instruction (iret, cpuid...).
3 - First write an int3 instead of the instruction's first byte. The handler
    would do the following :
    int3_handler :
      single-step the original instruction.
      iret

    Secondly, we call an IPI that does a smp_processor_id() on each CPU and
    wait for them to complete. It will make sure we execute a synchronizing
    instruction on every CPU even if we do not execute the trap handler.

    Then, we write the new 2 bytes instruction atomically instead of the int3
    and immediate value.


I exclude (1) because of the performance impact, (2) because it does not deal
with NMIs. It leaves (3). Does it make sense ?


> One additional thing we did, which might be useful at some future point,
> was adding a /proc interface. We reflected the current instrumentation
> though /proc and gave the status of each hook. We even talked about being
> able to enable or disabled instrumentation by writing to /proc but I don't
> think we ever implemented this.
> 

Adding a /proc output to list the active probes and their
callback will be tribial to add to the markers. I think the probe management
module should have its /proc file too to list the chains of connected handlers
once we get there.

> It's high time we settled the issue of instrumentation. It gets my vote,
> 
> Good luck!
> 
> Richard
> 

Thanks,

Mathieu

> - -
> Richard J Moore
> IBM Linux Technology Centre
> 

-- 
OpenPGP public key:              http://krystal.dyndns.org:8080/key/compudj.gpg
Key fingerprint:     8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]