This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

analysis of polling in systemtap

From: Martin Hunt <hunt at redhat dot com>
To: "systemtap at sources dot redhat dot com" <systemtap at sources dot redhat dot com>
Date: Fri, 16 Dec 2005 11:54:00 -0800
Subject: analysis of polling in systemtap
Organization: Red Hat Inc

For this report, when I say "polling", I mean a systemtap script that
has a timer firing at fixed intervals so it can print out updated data.
Commonly this is done like so

probe timer.ms(1000)
{
 # clear screen
 # sort data
 # print top 'n' pieces of data
}

This timer probe presents implementation problems because on an MP
system, while the timer probe is printing data, kprobes may be firing
trying to update the data. We do things to minimize the problem, however
it is unavoidable that data will sometimes get dropped. Currently this
is a fatal situation, with an error printed and the script terminated.
"ERROR: locking timeout over variable global_called"

These conflicts should not be fatal and instead must simply result in a
quiet drop of data with perhaps an increment of a counter for drops.
There is a PR for this: http://sourceware.org/bugzilla/show_bug.cgi?
id=1379

In most of my scripts I like to clear the screen before each update.
Unfortunately if an error is hit, the error message is sometimes cleared
before the script exits. This is confusing because I don't know why my
script exited.  This is due to the buffering. Error and warning messages
are sent over a separate channel and are displayed immediately.
Obviously this is not always a good idea. Other times it is nice to be
able to separate them from normal program output. Suggestions?

To minimize dropped data, we should be as efficient as possible and
minimize the time global locks are held on data structures. That means,
if you only care about the top 10 elements in a map, don't print the
whole map. 

For example, in a typical case, 
probe timer.ms(500)
{
        cls()
        num_to_do = 10
        foreach ([n,f] in called-) {
                printf("%d called %s\t%d times\n", n, f, called[n,f])
                num_to_do--
                if (num_to_do <= 0)
                        break
        }
        delete called
}

For 107 elements in the called map,
sort took    46 usecs
foreach took 18 usecs
clear took    4 usecs
For larger arrays, the numbers get much worse, especially for sorts.

One way to improve these numbers is just find the top or bottom 'n'
elements in the array. For small values of 'n', this can be done much
more efficiently that sorting the whole array. I created a PR for this
http://sourceware.org/bugzilla/show_bug.cgi?id=2051

The syntax gets a bit confusing. I have never really liked having to use
a foreach to print an array. I think of them as objects and I want to
print them with a single line. For example,

print(called) - prints the whole map in a standard format.

printa(called, 10, "%1d called %2s\t%d times") - prints the first 10
elements in the 'called' array with a format string.  (%1d means print
key1 as decimal, etc). 

These are more efficient than using foreach.

So the example timer probe might be written as:
probe timer.ms(500)
{
        cls()
	sortn(called, 10, SORT_VALUE)
        printa(called, 10, "%1d called %2s\t%d times")
        delete called
}

Comments?

Martin

Follow-Ups:
- Re: analysis of polling in systemtap
  - From: Mathieu Desnoyers
- Re: analysis of polling in systemtap
  - From: Frank Ch. Eigler

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]