This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: static instrumentation for kernel

From: Tom Zanussi <zanussi at us dot ibm dot com>
To: "Frank Ch. Eigler" <fche at redhat dot com>
Cc: systemtap at sources dot redhat dot com
Date: Tue, 13 Dec 2005 12:12:27 -0600
Subject: Re: static instrumentation for kernel
References: <20051212224036.GA14871@redhat.com>
Frank Ch. Eigler writes:
 > Hi -
 > 
 > Here is one set of ideas about inserting static instrumentation points
 > into the kernel.  It predates but is related to the discussion this
 > summer: <http://sources.redhat.com/ml/systemtap/2005-q3/msg00122.html>
 > It has what is perhaps an interesting combination of features.  It is
 > simple, architecture-neutral, does not require nonlocal artifacts like
 > per-probe declarations, and hopefully is not that slow.  There are
 > certainly some shortcomings and oversights - please be critical.
 > 
 > 
 > The code to be inserted into kernel sources would be a plain macro
 > call such as:
 > 
 >    SYSTEMTAP_PROBE(name) 
 >    SYSTEMTAP_PROBE_N(name,arg1) // arg1 castable to int64_t numeric
 >    SYSTEMTAP_PROBE_NS(name,arg1,arg2) // arg2 castable to char* string
 > 
 > The name should be unique within the function.  As you see, arguments
 > can be passed, encoding the type/arity into the macro name.  Possibly
 > some super clever typeof() conditionals can make that implicit.
 > What these macros would expand to is the following.  We'd generate a
 > menu of these for reasonable arities/type combinations and shove them
 > into a kernel header.
 > 
 > #define SYSTEMTAP_PROBE(name) \
 >    do { \
 >        static void (*__systemtap_probe_##name)(); \
 >        if (unlikely(__systemtap_probe_##name)) \
 >            (__systemtap_probe_##name) ();  \
 >       } while (0)
 > #define SYSTEMTAP_PROBE_NS(name,arg1,arg2) \
 >    do { \
 >        static void (*__systemtap_probe_ns_##name)(int64_t, const char*); \
 >        if (unlikely(__systemtap_probe_ns_##name)) \
 >            (__systemtap_probe_ns_##name) ((int64_t)(arg1), \
 >                                           (const char *)(arg2));  \
 >       } while (0)
 > 

[...]

Just to throw another idea into the mix...

Here's some code I've been playing around with in an effort to provide
some better and more useful 'real-world' relay-app examples (the 'qdt'
in the code stands for 'quick and dirty' tracer).  It's a set of
macros that automatically generates event structs and ids.  It's a
work in progress - the point of it is to make it relatively easy for
developers to add new events when doing 'ad hoc' tracing.  As the name
implies, it's not meant to be used for production systems, as it
requires a rebuild of the kernel to add a new event (but I guess
adding new static tracepoints requires that in any case), but since it
also does some autogeneration for the purposes of static logging, it
might be of some interest.

Basically, to add a new event, you add a simple event description to a
header file and in the code you want to trace, and some boilerplate
code that fills in the struct and logs it.  On the user side, event
descriptions are available as proc files (or the common header can be
included in the user app and recompiled).

i.e. to add a new event, you add a line to the EVENTS #define and add
a #define for each event as below, the pattern should be obvious.

/* start event definitions */

#define EVENTS(ACTION, sep)            \
        ACTION(kmalloc_trace) sep \
        ACTION(kfree_trace) sep

#define kmalloc_trace_fields(event, event_name, ACTION) \
    ACTION(event, event_name, alloc_addr, void *)       \
    ACTION(event, event_name, alloc_size, size_t)       \
    ACTION(event, event_name, obj_size, int)

#define kfree_trace_fields(event, event_name, ACTION)   \
    ACTION(event, event_name, free_addr, void *)        \
    ACTION(event, event_name, obj_size, int)

/* end event definitions */

/* struct/id generation macros, inspired by a comp.lang.c++.moderated
   posting by Christopher Eltschka */

#define DECLARE(event, event_name, field, type) type field;

#define DECLARE_EVENT(event_name)       \
struct event_name##_struct \
{                               \
        unsigned char event_id;         \
        struct timeval timestamp;       \
        event_name##_fields(NULL, event_name, DECLARE)  \
} __attribute__((__packed__))

#define REGISTER(event, event_name, field, type) \
        register_qdt_field(event, #field, #type, offsetof(struct event_name##_struct, field), sizeof(((struct event_name##_struct *)0)->field));

#define REGISTER_EVENT(event_name)              \
        {                                       \
                struct qdt_event *event = register_qdt_event(#event_name, event_name, sizeof(struct event_name##_struct)); \
                event_name##_fields(event, event_name, REGISTER);       \
        }

#define EVENT_ID(event_name) event_name

#define ID_EVENTS(events) \
        enum qdt_event_id { \
                events \
        }

#define COMMA ,
#define SEMICOLON ;
        
/* auto-create the event id enum */
ID_EVENTS(EVENTS(EVENT_ID, COMMA));

/* auto-create the event structs */
EVENTS(DECLARE_EVENT, SEMICOLON);

The 'register' functions and macros make the event descriptions appear
in proc files, for a userspace app to read at runtime.  This approach is
similar to what web100 does - thanks to Baruch Even for the suggestion
to take a look at this mechanism.

static int init(void)
{
	/* create the event description proc files */
        EVENTS(REGISTER_EVENT, SEMICOLON);
}

Finally, at the instrumentation point, space for the event is reserved
and the data is written into the reserved space, using the
autogenerated event id and struct definition.  qdt_reserve() reserves
space for the event, and also fills in the common fields such as event
id and timestamp before returning, upon which the logging code
directly fills in the rest of the struct.  The nice thing about
knowing the event size ahead of time is that it makes it easy to write
directly into the spot reserved for the data, and it also removes the
need for any intermediate buffering.

in mm/slab.c/__kmalloc():

        /* log kmalloc event */
        if (qdt_chan) {
                unsigned long qdt_flags;
                struct kmalloc_trace_struct *qdt_event;

                local_irq_save(qdt_flags);
                qdt_event = qdt_reserve(kmalloc_trace);
		/* qdt_event is a pointer to reserved memory, which
		   we treat as a pointer to the event struct */
                if (qdt_event) {
                        qdt_event->alloc_addr = alloc_addr;
                        qdt_event->alloc_size = size;
                        qdt_event->obj_size = obj_size;
                }
		/* nothing left to do, we've reserved and written */
                local_irq_restore(qdt_flags);
        }

For efficiency, a top-level check for the open channel skips over the
block if there's no channel open.  Part of the process of closing the
channel is to first switch the channel buffer with a junk page, so if
the channel goes away between the outer check and the qdt_reserve(),
in that case qdt_reserve() will simply reserve junk in the junk page
with no harm done.  At least that's my current plan, such as it is.

The user side can directly pluck whatever it wants from the event
stream, given access to the event descriptions, either via proc files
or by including the auto-generated event structs/ids in the event
header.

Tom
References:
- static instrumentation for kernel
  - From: Frank Ch. Eigler
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]