This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On 09/01/2009 10:59 AM, David Smith wrote: > On 08/20/2009 02:45 PM, Frank Ch. Eigler wrote: >> Hi - >> >> I ask asked to share some snippets of an old idea regarding a possible >> compelling application for systemtap. Here goes, from a few months >> back: >> >> ------------------------------------------------------------------------ >> >> The technical gist of the idea would have several parts: to create a >> suite of systemtap script fragments (a new "health" tapset); and to >> one or more front-ends for monitoring the ongoing probes graphically >> and via other tools. >> >> Each tapset piece would represent a single subsystem or concern. A >> variety of probes could act to represent a view of its health >> (tracking allocation counts, rates of activity, latency trends, >> whatever makes sense). The barest sketch ... >> > > ... sketch removed ... > > I've been taking a stab at implementing this. Here's what I've discovered. ... stuff deleted ... > - number of context switches: You can see the current number of context > switches by looking in /proc/stat in the 'ctxt' line. This information > comes from calling the nr_context_switches() function in kernel/sched.c. > nr_context_switches() gets this information from a per-CPU runqueue > structure (which contains lots of interesting information). > Unfortunately, neither the nr_context_switches() function is exported > nor the underlying runqueue data structure is exported. The nr_switches > field of the runqueue structure gets incremented in schedule(), but it > is possible for for schedule() to increment nr_switches more than once > (and we have no way to detect this). One of the things I've discovered is I need to look at our existing tapsets more - there is already a 'schedule.ctxswitch' probe point that exists that is in the correct spot. Here's a baby implementation of this idea. It only reports context switches. After untar'ing, you'd run it like this: # stap -I tapset/health resource_monitor.stp 'health.*' 1252098915,context_switches,2149 1252098925,context_switches,4352 Besides needing more information sources, we also need to think about what makes a system "unhealthy". For instance, in the case of context_switches, the health monitoring code could check for too many context switches within a certain time interval. Of course the hard part is knowing what is "too many" (or at least how to make it configurable). -- David Smith dsmith@redhat.com Red Hat http://www.redhat.com 256.217.0141 (direct) 256.837.0057 (fax)
Attachment:
health_monitor.tar.bz2
Description: application/bzip
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |