This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
health monitoring scripts
- From: "Frank Ch. Eigler" <fche at redhat dot com>
- To: systemtap at sources dot redhat dot com
- Date: Thu, 20 Aug 2009 15:45:43 -0400
- Subject: health monitoring scripts
Hi -
I ask asked to share some snippets of an old idea regarding a possible
compelling application for systemtap. Here goes, from a few months
back:
------------------------------------------------------------------------
The technical gist of the idea would have several parts: to create a
suite of systemtap script fragments (a new "health" tapset); and to
one or more front-ends for monitoring the ongoing probes graphically
and via other tools.
Each tapset piece would represent a single subsystem or concern. A
variety of probes could act to represent a view of its health
(tracking allocation counts, rates of activity, latency trends,
whatever makes sense). The barest sketch ...
$ cat tapset/health/resource-levels
global _resource_levels
global _resource_min
global _resource_max
$ cat tapset/health/resourceX-simple.stp
probe begin { _resource_min["resourceX"]=100
_resource_max["resourceX"]=999999 }
probe kernel.function("resourceX_init")
{ _resource_levels["resourceX"] = $value }
probe kernel.function("resourceX_alloc")
{ _resource_levels["resourceX"] -- }
probe kernel.function("resourceX_free")
{ _resource_levels["resourceX"] ++ }
probe health.resourceX = never {}
$ cat tapset/health/resourceY-markers.stp
probe begin { _resource_min["resourceY"]=1
_resource_max["resourceY"]=1000
_resource_levels["resourceY"]=_rY_currentValue() }
# pull out starting value via embedded C
function _rY_currentValue:long () %{
THIS->__retvalue = some_kernel_function(); %}
probe kernel.mark("resourceY_alloc")
{ _resource_levels["resourceY"] -= $arg1 }
probe kernel.mark("resourceY_free")
{ _resource_levels["resourceY"] += $arg1 }
probe health.resourceY = never {}
$ cat tapset/health/resourceZ-rate.stp
probe begin { _resource_min["resourceZ"]=0 }
probe begin { _resource_max["resourceZ"]=10000 }
global _resourceZ_occur
probe kernel.mark("eventZ") { _resourceZ_occur <<< 1 }
probe health.resourceZ = timer.s(1) /* 1-second average */ {
_resource_levels["resourceZ"] = @count(_resourceZ_occur)
delete _resourceZ_occur
}
$ ls tapset/health/*.stp # figments of my imagination
resourceX-simple.stp
resourceY-markers.stp
resourceZ-rate.stp
network-buffers.stp
page-pool.stp
hw-interrupt-rate.stp
process-count.stp
...
$ cat resource-monitor.stp
probe $1 {}
probe timer.s(10) {
foreach(resource in _resource_level)
printf("%d,%s,%s\n",
gettimeofday_s(),
resource, _resource_level[resource])
}
$ cat resource-alarm.stp
probe $1 {}
probe timer.s(10) {
foreach(resource in _resource_level) {
if (_resource_level[resource] < _resource_min[resource])
printf("%d,%s,%s,LOW\n",
gettimeofday_s(),
resource, _resource_level[resource])
else if (_resource_level[resource] > _resource_max[resource])
printf("%d,%s,%s,HIGH\n",
gettimeofday_s(),
resource, _resource_level[resource])
}
}
And here's how to use it:
# stap resource-monitor.stp 'health.resourceX'
1234000,resourceX,30
1234030,resourceX,40
1234060,resourceX,50
This could go straight into a graphical trace viewer such as timoore
is starting to work on.
# stap resource-alarm.stp 'health.*'
1237030,resourceX,40,HIGH
1238050,resourceZ,50,LOW
This could go into syslog/etc. notifications.
So what's neat here?
All this code can be running in the kernel in the background.
Different subsets of resources can be monitored / alarmed
concurrently. Resources not selected for monitoring would cause no
space/time penalties.
A new resource can be defined by installing one extra script file (the
"tapset/resourceX-simple.stp" file above. None of the user-facing
front-ends would need to be changed. A resource can span whatever
systemtap can probe - including value/rate/trend statistics derived
from probing e.g. userspace ("number of threads started per unit
time") A kernel/other subsystem maintainer could encode his knowledge
of a variety of thresholds/constriants into script form.
If the resource-tracking code is based on tracepoints/markers instead
of kprobes, then there should be no performance concerns about the
expense of ongoing resource tracking. Systemtap would also work
without debuginfo for these. "health-monitoring" could be a simple
and intuitive enough idea for kernel subsystem people to feel
motivated to provide tracepoints for.
What do you think?
- FChE