This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

C tapsets take 2



C tapsets


In order to facilitate accessing kernel functions that provide valuable data that is useful for analyzing problems a "C" language interface is defined.

Systemtap infrastructure implements a default tapset called systap. This tapset apart from other facilities it provides, it also maintains the repository of all the tapset functions available in the system. Tapset functions in a related area are grouped into a module. During the module initialization tapsets registers the functions in the module and as well as the data exported by the functions with systap module. The interface to register tapset functions is as follows.

tapsetfunc_register(co_ordinates, tapsetfunc_addr, probed_func_args, probed_func_variables, exported_data)

co_ordinates: This argument specifies the address at which this particular tapset function can be used as the probe handler.

tapsetfunc_addr: This is the address of the tapsetfunction that can be called in the handler at co_ordinate address.

probed_func_args: This argument specifies what local variables of the probed function this tapset function needs access, in order for it to export the data.

probed_func_variables: This argument specifies what active local variables of the probed function this tapset function needs access, in order for it to export the data.

exported_data: This argument specifies the data type of the variables exported by this tapset function.

The list of tapsetfunction API's along with data exported by them is published. End users can use these published functions in their scripts. Based on user specified scripts systemtap generates code which can invoke one or more tapset functions.

Following is the "C" language interface for tapset function itself

tapset_func(regs, probed_func_args, probed_func_vars, return_vars)

regs: This argument contains processor register set.

probed_func_args: This argument contains the pointers to the list of arguments of the probed function. This list is based on the args list specified in the tapsetfunc_register.

probed_func_vars: This argument contains the pointers to the local variables tapset function needs. This list is based on the list specified in the tapsetfunc_register for this function.

return_vars: pointers to memory allocated variables for return values from the tapset function.

Following example illustrates the "C" language API.

Let us say a kernel expert would like to provide a tapset function for sys_read systemcall when the file position is updated and exports the new fileoffset and filename. That function would be registered as follows

tapsetfunc_register("kernel.function(sys_read).linenum(9)", //location in the probed function
tapf_sysread_done, //tapset function that can be called in the probe handler
NULL, //args of the probed function need in the tapset function
"pos, file", //local variables of the probed function
"long long, char *" //data type for return variables
);


The tapset function it self will be written as follows

tapf_sysread_done(struct pt_regs *regs,
   NULL,  //probed function args
   loff_t pos, struct file *file,  //probed function locals
   long long *ret_offset, char *name) // return values
{
   *ret_offset = pos;
   strcpy(name, filp->f_dentry->d_name.name);
}


We could publish above tapf_sysread_done as return for sysread system call in the syscall tapset.
The advertised API will say
function name arg0 arg1
read.return offset filename


End users can write a script like the following and systemtap will generate the code to access above tapset function.


global read_pos filename pid


probe kernel.syscall("read").return{
thread->read_pos = $pos;
thread->filename = $filename;
thread->pid = $PID;
trace ("File name ", thread->filename, "Offset after read", $pos, "Process ID", $PID);
}


One can say when system tap is figuring out all the variables needed from the debug information why we should call the tapset kernel function to get just filename in this example. Yes, in this simple example systemtap can get the data without calling kernel function but here is a more complicated example from vmtapset work.


/* same signature as madvise_dontneed */ static void inst_madvise_dontneed(struct vm_area_struct * vma, unsigned long start, unsigned long end) { unsigned long addr; struct page *p; int page_count = 0, page_count_sum = 0;

printk("madvise_dontneed() called, entering probe...\n");
printk(" vma=0x%p\n"
" start=0x%lx\n"
" end=0x%lx\n", vma, start, end);
for (addr = start; addr < end; addr += PAGE_SIZE) {
p = follow_page(vma->vm_mm, addr, 0);
if (p != NULL) {
page_count++;
page_count_sum += atomic_read(&(p)->_count);
}
}
printk("vma range contained %d pages with a count sum of %d (avg=%d)\n",
page_count, page_count_sum, page_count_sum/page_count);
}


As you can see in the above example the instrumentation function does lot more work including walking through the list, locking etc
which i feel is lot easier for kernel expert to write than systemtap to automatically generate the code.




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]