This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Notes from the systemtap BOF


Hello Charles,

It was great meeting you at the OLS and discuss various issues relating
to kernel behavior monitoring.

Thanks for this excellent summary.

Here are some random comments.

Spirakis, Charles wrote:
> Example limitations to djprobe:
> 1) Jmps are multiple bytes, need to watch for branches to the middle of
> the old code
> 2) Insertion in "exception areas" like copy_from_user, when emulating
> the instructions that could fault.

There was one gentleman who was concerned regarding that all CPUs would
pass through the kprob'ed point prior to replacing the original code.
I think the question went something like: But what if not all CPUs go
through that code path?

> A request was made for a standard way to specify markers so that
> different subsystems can use the same methods. Each subsystem has its
> own set of hacks/tracing facility which makes it hard to debug
> problems that cross subsystems.

I actually discussed the issue with a prominent kernel developer on
Saturday morning and he was all for it. I explained that we may need to
do some form of preparsing in order to avoiding having a separate
declarative statement from the action statement. He said "as long as the
default behavior does not require any parsing."

> Consider a file, for example, linux/kernel/sched.c
> 
> At the top of files:
> Declaration
> one per EVENT_ID per FILE
> ev_trace_declare(EVENT_ID, EVENT_TAG, EVENT_NR, EVENT_DEFAULT,
> PARAM(TYPE),...)
> 
> EVENT_ID -  unique identifier for a "group" of events
> EVENT_TAG - name/string
> EVENT_NR - number of fields to be passed in. Maximum.
> EVENT_DEFAULT - on or off
> 
> Actual usage:
> ev_trace(...)
> 
> Concern was that splitting declaration from usage may still be an issue.
> 
> Question, can it get down to a single MACRO?
> 
> GEN_TRACE(uniqueid, param1, param2, ...)
> 
> or N macros for N arguments:
> GEN_TRACE1(uniqueid, param1)
> GEN_TRACE2(uniqueid, param1, param2)

Actually, I've been thinking about this further and I've got it down to
something like (avoiding the word "trace" altogether):
evmarker(EVENT_TAG, EVENT_HANDLERS, EVENT_NR, PARAM/TYPE, ...)

Notice that I got rid of the EVENT_ID. Instead, I think events should be
indentified by a concatenation of:
- The full path to the file inside the kernel tree (ex.:kernel/sched.c)
- The name of the function where they are located (ex.:schedule())
- The order in which they are located inside that function.
So, for example, the identifier of the lone marker inside schedule() for
the scheduling change would be something like:
"kernel/sched.c:schedule:1"
The actual "ID" could be something like an md5 based on that string.
Correlation between IDs and original string (the actual event) would be
done in user-space.

Also, there would be no declaration, just the marker. By default, there
would be something like this in a header:

#ifndef CONFIG_MARKERS
#define evmarker(...)
#else
#include <evmarker-defs.h>
#endif

IOW, the markers are inactive by default.

There are at least 3 initial ways a marker could be dealt with:
- printk
- static trace point
- systemtap hook point

This is where the "EVENT_HANDLERS" field is relevant. This would be a
bit-field of things like: MARKER_PRINTK | MARKER_TRACE | MARKER_PROBE.
This would specify how the marker could be dealt with. The sched
change, for example, could not have MARKER_PRINTK enabled.

Also, the markers framework should allow for the existence of two macros
which would be used at the top of files to control markers locally should
a developer which to do local debugging. Something like:
#define ENABLE_LOCAL_MARKERS
        This would force-enable local markers even if the kernel config
        has got markers as disabled.
#define LOCAL_MARKERS_HANDLER foo()
        This would allow a developer to locally override the default marker
        handlers (printk, trace, probe, etc.) with his/here own function.

In the case of the use of printk and probe/systemtap, it may be that
there wouldn't be any preparsing needed -- replace marker with printk()
or add nops or the likes for probe/systemtap. In the case of tracing, though,
the more I think about this, the more I think there will be a need for
some form of preparsing in order to declare the events (i.e. generate a
header and possible a .c) and register them prior to them being reached
at runtime.

> How do you enable/disable behavior on a per file basis?

That would depend on who handles the events. In the case of probes, I
guess a compiled kernel with markers should generate some form of table
(be it in a text file or a binary file section) that would be used to
locate probes based on event identification strings (see above.) In
the case of tracing, there would be something similar, but there would
be a need to maintain inside the kernel the list of events (possibly
hashed) to know what's enabled and what's not.

> How do you enable/disable at runtime? Can systemtap be used to access
> the marker information and apply probes at the corresponding spot and
> then dump values? Seems likely given systemtap will be reading dwarf
> symbolic information to do something similar.
> 
> What should the macro (or macros?) actually look like? Should it
> generate code? Or just add information to a special section in the
> resulting binary (a .maker section, similar to a .bss or .text section).
> Potentially, marker information is a lot like debug information. Could
> be stripped/split from the shipped binaries as long as there is a way to
> get the information when necessary (like dwarf symbolic information).

What actually happens, I think, will really depend on what the marker
is made to do: printk, probe, trace?

For an end-use standpoint, it would be very interesting to be able to
"browse" the list of available markers inside the kernel. I guess the
pluggable nature of lttv could be used for that: provide a tree of
markers which one can graphically browse and enable markers on the
fly, possibly also allowing the capability of "saving" the set of
currently enabled markers for future use.

> Need to talk to the compiler people to see how this can work.

Yes, I'd love to know more about what gcc can be made to do in this
area.

> Could implement much of this (macro, generation of descriptor for
> lttv, etc) for "proof of concept" without requiring kernel changes.

True. Actually I think that much of this marker stuff should be
relatively easy to implement and represent little instrusion on the
kernel codebase. And both of these should help its case for inclusion.

> Discussion to continue on systemtap mailing list.

I'm all for it for the initial stages of coming up with a more concrete
proposal and maybe some test code. It would be very important, though,
that this gets onto the kernel mailing list ASAP as there is likely to
be much discussion there as to the final form this takes. And, as a
general rule, kernel developers prefer to be part of the action when
it comes to the design and implementation of core functionality such
as this. So when we get something a little bit more defined, we should
post something to the LKML with the heading "[RFC]" and take it on
from there.

Thanks again for taking the time to write down this summary,

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || karim@opersys.com || 1-866-677-4546


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]