This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

probe lifecycle control


Hi -


Here are some ideas about the management of probe function lifecycle.
These translate to functional requirements upon the runtime, so Martin
might want to build this stuff into his library, rather than having
the translator spit it out inline.

There are a bunch of requirements:

Probe functions, once they get started, must not block while holding
any kind of lock.  Actually, they should not block at all, except
perhaps while invoking might_sleep operations such as copy_from_user,
if we end up somehow supporting the full variants.

Once really started, probe functions should not be aborted unless
provoked by a severe error, such as running out of memory or running
"overtime".  This latter condition would be enforced by a
per-invocation work unit counter that is incremented during probe
execution, and checked periodically such as at branches.  A counter
should be kept of such abort occurrences too.  The user-level driver
code may opt to abort the entire systemtap session once such errors
occur.

Each probe function needs to be able start concurrently with others
and itself.  This requires locking of shared data such as those lookup
tables, in one of several ways such as per-system or per-variable/access
spinlocks.

We need to be able to block a module unload (or /proc snapshot) while
any probe functions are running.  Such blocking would occur in user
context and would be fine for even a longer though not indefinite
period.  In order to limit such a wait, this blocking function needs
to arrange freshly started probes to return prematurely.  The probes
would increment a counter so we can later tell the user that this
happened, since a subsequent /proc snapshot would contain data
inconsistencies due to the lost probe.

There are some possible races during initialization and shutdown that
the code needs to manage.  Realizing that all the probe insertions
take time, it is likely necessary that probes triggered prematurely
(while insertion of sibling probes is still underway) should
early-abort themselves quietly.  The same applies once shutdown has
begun.  This type of early abort probably doesn't need to be counted,
by definition.

One needs to consider the special BEGIN/END probe functions too, which
would correlate with the state transitions from "starting" to
"running" to "stopping".


There are probably some other requirements.  I believe they can be met
by atomic_t counters, global run-state indications, per-variable
spinlocks, and some clever logic in the heads of probe functions and
elsewhere.  Rather than spell out my fuzzy image of this, would
someone (Martin?) like to take over this issue and implement it?
This too should be testable with a multithreaded user-level driver.


- FChE

Attachment: pgp00000.pgp
Description: PGP signature


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]