This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
Here is an implementation of generating whitelist for safe-mode probes based on the discussions in former thread (http://sourceware.org/ml/systemtap/2006-q3/msg00574.html).
Its main idea is: 1) Fetch a group of probe points from probes.pending, probe them and run some workload(e.g. runltp -t 60s) meanwhile.
2) If the probe test ends without crash, those actually triggered probe points are moved into probes.passed and those untriggered are into probes.untriggered; If the probe test crashes the system, it will be resumed automatically after system reboot. Those probe points which have been triggered are also moved into probes.passed, but those untriggered ones are moved into probes.failed.
3) Repeat the above until probes.pending becomes empty, then: Normally, probes.pending is reinitialized from probes.failed (or probes.untriggered if probes.failed is empty) and start the next iteration; But if max iteration limit (e.g. 5) is reached, or probes.pending, probes.failed and probes.untriggered are all empty, stop the whole test.
To be able to resume after a crash, this test will register itself as the last system service at the beginning and unregister itself at the end.
I also use a script in a remote server to restart the test machine automatically if it is crashed.
Usage: runtest whitelist.exp * Please remove "/stp_genwhitelist_running" first if you want to restart the test from the scratch.
Rough result on my 2.6.18/ppc64: root:/home/root/testsuite/systemtap.stress>wc -l probes.* 13209 probes.all * all input probe points 2 probes.failed * causing crash and unrecorded in probe.out 1146 probes.passed * whitelist,for those recorded at least once 12061 probes.untriggered * probed without crash but unrecorded
Some problems: 1) How to choose workload to trigger as much as many probe points? Now only 9% of all are triggered when using "runltp -t 60s". 2) How to set the proper sizes of each group for different iteration levels? A smaller group size will help find those probe points really crashing the system, but also causes longer running time. After all, there are 13209 probe points for kernel.function("*") and 31132 for kernel.inline("*"). btw: do we need to handle module("*").function("*") ?
Attachment:
genwhitelist.1017.tgz
Description: Binary data
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |