This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Some notes on translation


Hi -


> [...]
> - self->xxx means xxx is a thread-local variable

I'm unfond of the pointer syntax in the script language (see below),
but this particular case can be mapped easily in the parser to an
array index operation like "xxx[$pid]".


> - $xxx is shorthand for values to be substituted by runtime library
>   functions or probe variables, similar to Perl interpolation

We may need to consider a naming system that can be composed into
richer identifiers.  There are several types of variables to access:
- "macros" like "$timestamp", which map to snippets of code
- target-side variables: local (function parameters, locals), global
- probe-side special variables like your "$syscall_name"

> [...]
> probe syscall:exit("read")
> 	read_times[$syscall_name]
> [...]

  I am aware of no plausible run-time library function that can return
  the name of the current system call.  Rather, I imagine this sort of
  facility working by having a library of systemtap script fragments
  that provide definitions for probe points or helper variables:

  probe syscall("read") = kernel:function("sys_read") {
     self->syscall_name = "read"
  }

  and

  $pid = [[ in_interrupt () ? 0 : current->pid ]]   # possible embedded C


> [...]
>   Will this still work if count isn't a int value but say an int *?
> 	self->my_count = *count;
>   Seems to - if jprobes is being used, it's just a straight pass-thru.

Passing through in this sense concerns me.  If the scripting
language's type system is to remain as minimal and implicit as
possible, then operations like pointer dereferences and especially
structure accesses need to be represented and analyzed.  (See more
below.)


> [...]
> - To set up the probes, this example loops over each syscall and
>   registers the single probe handler for each one.  [...]
>   It seems to me that we need a way to enable and disable
>   probes as needed or 'just in time'.  For example, here's a probe that
>   we should be able to write:
> 
> /* trace all functions called from open */
> probe syscall:entry("open")
> {
> 	self->trace_all = 1;
> 	enable(*:entry(*)); /* enable probes on _all_ functions */
> }

I don't know if this will be possible.  Among other reasons we
discussed yesterday, "all functions" in the kernel is far too wide a
net.  If instrumentation were to be inserted anew every time, imagine
the thousands of pages of kernel text being modified, when any process
runs "open".  Else if breakpoints were inserted en masse at startup
time, and enabled/disabled by having them each execute some predicate,
overall performance would still come to a crawl.


>   [...]  It should support the print() function from probe handlers,
>   and it should also support queries from userspace applications
>   such that they can retrieve data from the probe at any time [...]
>   a simple protocol built on top of netlink seems to me to be the
>   best fit.  [...]

I wonder what sort of tool would want to extract data piecemeal like
this.  Are you imagining someone actually writing some user-level C
code to pull out data snapshots from a specific running probe?  I
wonder if this situation is likely to become common enough to warrant
a two-way API.

By the way, one reason I prototyped that /proc-based data snapshot
mechanism that way was in recognition of the problem of consistency.
It suspends the probes, takes a snapshot of all global variables
during the incoming open() syscall.  It then lets the probes run again
and streams the textual snapshot out during subsequent read()'s.
The snapshot is thrown away at close().

If, as is likely, multiple pieces of data need to be pulled out of the
probes, it is important that those pieces be consistent with each
other: that they correspond to a locked snapshot taken at the same
instant.  Being able to pull out just one variable at a time would
make this property achievable only if it involved long-term suspension
of probe data collection between the adjacent pull operations.


> [...] Notes: - the main problem this probe illustrates is that it's
> not yet clear how to access data represented by composite data
> types, or how to handle types like atomic_t which need to use an
> accessor function.  The location and size of the struct members is
> known from dwarf2 info, but how do we seamlessly access and use it
> in the probe?

Indeed, this is one of the big open holes in the design.  It would be
great if someone came up with a notation and execution algorithm that
requires neither psychic abilities within the translator, nor any
excessive presence of C typing declarations within the script.

Maybe cleverly embedded C code could do the trick, as long as it is
hidden out of the way in installed script libraries.  These might be
presented to user scripts in a functional notation, like

          file$f_dentry$d_inode (filp)

that maps to C code such as

          ({ struct file* f = $filp;
             struct dentry *d;
             check (f) ? (d = f->f_dentry, (check (d)
             ? d->d_inode: 0) : 0);  })

But hand-writing a multitude of these is too much.  We need to think
of a way of expressing dwarf type/expression evaluation, without
having to be totally explicit.


>  - What about looping over external lists e.g. starting with
> list_head?

That's much the same problem.


- FChE

Attachment: pgp00000.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]