This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFC: Access to kernel data structures


As promised this morning...

One of the central undecided issues regards access to âkernel data
structuresâ â in other words, access to the context (local and global
variables, args, struct definitions, etc.) of a probed function. Here
are the typical questions, with my proposed answers.

1. Should a systemtap script have access (at least read access) to the
probed function's variables? Yes, within limits.

2. If so, how do you express references to such variables? See below.

2a. Can this be done without making the grammar of the systemtap
language much more complicated? Yes, sort of.

2b. Do "expert" scripts need to be coded in a completely different
language? No.

2c. What would this look like? See below for an example.

A related question regards whether/how a tapset exports specific values
to a script that uses that tapset. Again, see below.

As you may know, Tom Zanussi's dpcc language for dprobes solved this
problem with the construct
	probe_expr("<C_expression>")

For example,
	probe_expr("x->y + z")
evaluates to the sum of the variables x->y and z in the context of the
probed function.  If x is at 4(%esp) and z is at 8(%esp) in the probed
function, and regs is the pt_regs pointer passed into the probe handler,
the resulting code in the handler might look something like this:

{
	struct xtype *x;
	int z;
	char *esp = (char *)&regs->esp;	// Don't ask why this works. :-}
	x = *((struct xtype **)(esp+4));
	z = *((int *)(esp+8));
	(long long) (x->y + z);
}

This is what I propose for SystemTap. If the "probe_expr" syntax is too
cumbersome, I'm open to syntactic sugar to make it more palatable. For
example,
	@x->y + z@
would be briefer, but still easy to tokenize -- if we disallow @ in such
expressions (even in string literals).

The C_expression could be any expression that evaluates to a string
(char pointer) or integer and makes sense in the context of the
specified probe point. I.e., if a gdb-like debugger were stopped at that
point,
	print <C_expression>
would print the appropriate value.

dpcc uses gdb's expression-evaluation machinery to generate the
corresponding code. Unfortunately, the way the dpcc makefile copies and
pastes that machinery into dpcc is not to be emulated. We need an API
(in the DWARF or gdb library) that exports this machinery â i.e., it
takes a string containing an expression, consults the debug information
for the specified context, and generates an expression tree with
appropriate memory and register refs. (Lacking that, we could code it up
ourselves, given appropriate access to the debug info, presumably
through the DWARF library.)

Hereâs an example that prints the args of the filp_open function, which
has the following signature:

struct file *filp_open(const char * filename, int flags, int mode)

kernel.function(âfilp_openâ).entry
{
	O_RDWR = 2;
	O_WRONLY = 1;
	f = @flags@;
	if (f & O_RDWR) {
		ftype = ârwâ
	} else if (f & O_WRONLY) {
		ftype = âwoâ
	} else {
		ftype = âroâ
	}
	printk(âfilp_open: path=%s, flags=%#x (%s), mode=%#o\nâ,
		@filename@, f, ftype, @mode@);
}


Exporting Values in Tapsets

The aforementioned expression-string-to-tree machinery would also be
central to the creation of tapsets. As at least some of us see it, a
primary purpose of a tapset is to export selected values to script
writers. Most of these values can be expressed as ordinary C expressions
in the context of the probed function (or a global context). For each
probe in the tapset, the tapset author would list name/expression pairs,
where the name is what the tapset-client scripts can use to access the
value of the expression. This is a separate issue that I wonât delve
into here.

Issues

There are various issues to settle, such as:

- how to maintain the firewall between non-expert scripts and kernel
data structures. (Disallowing such expressions in non-expert scripts
should handle that.)

- support for C macro references in such expressions. (To include macro
definitions in the debug info, you need to compile with âg3.)

- which values need to be "built in" -- e.g., @current@ for the current
task

- when exporting values to clients, how to express values that are most
intuitively expressed as struct members rather than simple names

- the need for function calls in (or in support of) certain expressions

- Frankâs aversion to a separate parser for such expressions. I have no
opposition to enhancing the systemtap parser to handle arbitrary C
expressions, but I thought we were trying to avoid that.

Comments, anyone? (Silly question. :-))

Jim


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]