This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Hi - Here are some ideas about the management of probe function lifecycle. These translate to functional requirements upon the runtime, so Martin might want to build this stuff into his library, rather than having the translator spit it out inline. There are a bunch of requirements: Probe functions, once they get started, must not block while holding any kind of lock. Actually, they should not block at all, except perhaps while invoking might_sleep operations such as copy_from_user, if we end up somehow supporting the full variants. Once really started, probe functions should not be aborted unless provoked by a severe error, such as running out of memory or running "overtime". This latter condition would be enforced by a per-invocation work unit counter that is incremented during probe execution, and checked periodically such as at branches. A counter should be kept of such abort occurrences too. The user-level driver code may opt to abort the entire systemtap session once such errors occur. Each probe function needs to be able start concurrently with others and itself. This requires locking of shared data such as those lookup tables, in one of several ways such as per-system or per-variable/access spinlocks. We need to be able to block a module unload (or /proc snapshot) while any probe functions are running. Such blocking would occur in user context and would be fine for even a longer though not indefinite period. In order to limit such a wait, this blocking function needs to arrange freshly started probes to return prematurely. The probes would increment a counter so we can later tell the user that this happened, since a subsequent /proc snapshot would contain data inconsistencies due to the lost probe. There are some possible races during initialization and shutdown that the code needs to manage. Realizing that all the probe insertions take time, it is likely necessary that probes triggered prematurely (while insertion of sibling probes is still underway) should early-abort themselves quietly. The same applies once shutdown has begun. This type of early abort probably doesn't need to be counted, by definition. One needs to consider the special BEGIN/END probe functions too, which would correlate with the state transitions from "starting" to "running" to "stopping". There are probably some other requirements. I believe they can be met by atomic_t counters, global run-state indications, per-variable spinlocks, and some clever logic in the heads of probe functions and elsewhere. Rather than spell out my fuzzy image of this, would someone (Martin?) like to take over this issue and implement it? This too should be testable with a multithreaded user-level driver. - FChE
Attachment:
pgp00000.pgp
Description: PGP signature
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |