This is the mail archive of the
systemtap@sources.redhat.com
mailing list for the systemtap project.
C tapsets take 2
- From: Vara Prasad <prasadav at us dot ibm dot com>
- To: SystemTAP <systemtap at sources dot redhat dot com>
- Date: Wed, 01 Jun 2005 23:32:51 -0700
- Subject: C tapsets take 2
C tapsets
In order to facilitate accessing kernel functions that provide valuable
data that is useful for analyzing problems a "C" language interface is
defined.
Systemtap infrastructure implements a default tapset called systap. This
tapset apart from other facilities it provides, it also maintains the
repository of all the tapset functions available in the system. Tapset
functions in a related area are grouped into a module. During the module
initialization tapsets registers the functions in the module and as well
as the data exported by the functions with systap module. The interface
to register tapset functions is as follows.
tapsetfunc_register(co_ordinates, tapsetfunc_addr, probed_func_args,
probed_func_variables, exported_data)
co_ordinates: This argument specifies the address at which this
particular tapset function can be used as the probe handler.
tapsetfunc_addr: This is the address of the tapsetfunction that can be
called in the handler at co_ordinate address.
probed_func_args: This argument specifies what local variables of the
probed function this tapset function needs access, in order for it to
export the data.
probed_func_variables: This argument specifies what active local
variables of the probed function this tapset function needs access, in
order for it to export the data.
exported_data: This argument specifies the data type of the variables
exported by this tapset function.
The list of tapsetfunction API's along with data exported by them is
published. End users can use these published functions in their scripts.
Based on user specified scripts systemtap generates code which can
invoke one or more tapset functions.
Following is the "C" language interface for tapset function itself
tapset_func(regs, probed_func_args, probed_func_vars, return_vars)
regs: This argument contains processor register set.
probed_func_args: This argument contains the pointers to the list of
arguments of the probed function. This list is based on the args list
specified in the tapsetfunc_register.
probed_func_vars: This argument contains the pointers to the local
variables tapset function needs. This list is based on the list
specified in the tapsetfunc_register for this function.
return_vars: pointers to memory allocated variables for return values
from the tapset function.
Following example illustrates the "C" language API.
Let us say a kernel expert would like to provide a tapset function for
sys_read systemcall when the file position is updated and exports the
new fileoffset and filename. That function would be registered as follows
tapsetfunc_register("kernel.function(sys_read).linenum(9)", //location
in the probed function
tapf_sysread_done, //tapset function that can be called in
the probe handler
NULL, //args of the probed function need in the tapset function
"pos, file", //local variables of the probed function
"long long, char *" //data type for return variables
);
The tapset function it self will be written as follows
tapf_sysread_done(struct pt_regs *regs,
NULL, //probed function args
loff_t pos, struct file *file, //probed function locals
long long *ret_offset, char *name) // return values
{
*ret_offset = pos;
strcpy(name, filp->f_dentry->d_name.name);
}
We could publish above tapf_sysread_done as return for sysread system
call in the syscall tapset.
The advertised API will say
function name arg0 arg1
read.return offset filename
End users can write a script like the following and systemtap will
generate the code to access above tapset function.
global read_pos filename pid
probe kernel.syscall("read").return{
thread->read_pos = $pos;
thread->filename = $filename;
thread->pid = $PID;
trace ("File name ", thread->filename, "Offset after read", $pos,
"Process ID", $PID);
}
One can say when system tap is figuring out all the variables needed
from the debug information why we should call the tapset kernel function
to get just filename in this example. Yes, in this simple example
systemtap can get the data without calling kernel function but here is a
more complicated example from vmtapset work.
/* same signature as madvise_dontneed */
static void inst_madvise_dontneed(struct vm_area_struct * vma,
unsigned long start, unsigned long end)
{
unsigned long addr;
struct page *p;
int page_count = 0, page_count_sum = 0;
printk("madvise_dontneed() called, entering probe...\n");
printk(" vma=0x%p\n"
" start=0x%lx\n"
" end=0x%lx\n", vma, start, end);
for (addr = start; addr < end; addr += PAGE_SIZE) {
p = follow_page(vma->vm_mm, addr, 0);
if (p != NULL) {
page_count++;
page_count_sum += atomic_read(&(p)->_count);
}
}
printk("vma range contained %d pages with a count sum of %d
(avg=%d)\n",
page_count, page_count_sum,
page_count_sum/page_count);
}
As you can see in the above example the instrumentation function does
lot more work including walking through the list, locking etc
which i feel is lot easier for kernel expert to write than systemtap to
automatically generate the code.