This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: whitelist for safe-mode probes (or just a better blacklist?)


Martin Hunt wrote:

On Wed, 2006-09-20 at 11:14 -0400, Frank Ch. Eigler wrote:


Martin Hunt <hunt@redhat.com> writes:



[...] To guarantee a probe will not crash the kernel it is going to
be necessary to generate a whitelist of probe points.


Sure, except that this guarantee is only as good as the method used to
generate the whitelist.



Of course.




[...] How would this all work? The whitelist and blacklist would be
files distributed with Systemtap. They would be updated
automatically with a test script. [...]


How do you imagine this test script working? Could it generate a list
roughly matching the "in-our-experience-so-far-safe" set in a
reasonable timeframe? (It would not be very helpful if it took months
to run, or resulted in a small list.)



I imagine this would be a list that would be checked into CVS of functions that have been tested and never caused problems. The only reason to use a whitelist instead of a blacklist is because we should be paranoid and not assume as new functions get added to the kernel, they are safely probeable, as we do now.

Writing a script to do this testing is not difficult, except for the
problems with lockups which require a way to remotely reboot a system.
This requires we assume the existence of special hardware or that the
test system is running on a specific virtualization system.  This needs
done regardless of what we decide about the need for a whitelist.  I
hoped to provoke some discussion about this.  We've talked about it, but
has anyone actually written any test scripts to test all the kernel
functions this way?

Martin




If i understand Martin's goal here is to come up with a list of functions that we know doesn't break for a given distribution/kernel. This list doesn't mean the functions outside the list or not safe, we just don't know and we don't want assume they are safe to probe. We can start with a simple approach where we only focus this white list for few distro releases and the major mainline release like 2.6.17, 18, 19 etc. of Linus tree, no -mm or any other git trees nor any rc candidates.

It shouldn't be that difficult to use DWARF library to generate all exported functions in the kernel. I am only focusing on exported functions first as their interfaces are more stable then some internal functions but this method can work on any function. If there happens to be a function if one of our tapsets is probing that is not in the above list we should add those functions as well. Once we have the function names, generate a script that puts probes in some percentage of the probes let us say 10% at each time in a sliding window. Loads the generated module and runs a standard test like ltp for 10 mins. The content of the probe handler should be to print the name of the function, increment a counter and also print some golbal variables like PID, GID etc. After being done with the whole list of the functions we should then generate a script that puts the probe in all the functions in the white list and runs few standard tests like ltp, fstest etc for 30 min to make sure probing all of the functions doesn't cause any instability problems.

Once we agree upon a format we can run these tests as part of the weekly test we are doing so we can catch problems early. Over a period of few weeks we can come up with a decent list that we feel comfortable. Once we have a big enough of safe list translator by default for wild card expansion consult this black list and white list and expand only to the function names from this list. We should also provide a way for us to indicate the translator i am testing i don't want you to restrict to only white list so do the real expansion of wildcards.

A side effect of this work could be after few weeks of results we can identify safe to probe routines we could probably even go a head and put some gcc magic macros in the kernel code itself that gives us info in the ELF section to say what functions are deemed safe to put probes. That way over a period of time we may not have to ship separate white list, but that is for future (now i am day dreaming :-) ).

Anyone got tomatoes?

bye,
Vara Prasad


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]