This is the mail archive of the
systemtap@sources.redhat.com
mailing list for the systemtap project.
Re: architecture paper draft
- From: Michael Brim <mjbrim at cs dot wisc dot edu>
- To: "Chen, Brad" <brad dot chen at intel dot com>
- Cc: "Sridharan, K" <k dot sridharan at intel dot com>, William Cohen <wcohen at redhat dot com>, Barton Miller <bart at cs dot wisc dot edu>, "Frank Ch. Eigler" <fche at redhat dot com>, "Stephen C. Tweedie" <sct at redhat dot com>, systemtap at sources dot redhat dot com
- Date: Fri, 11 Feb 2005 14:07:08 -0600
- Subject: Re: architecture paper draft
- References: <75EC4D5486CAC247B84AAAA6F96AA558039D718A@orsmsx402.amr.corp.intel.com>
Chen, Brad wrote:
Maybe Bart and Michael can explain more about how they
collected these numbers. I don't believe there is a paper
to cite.
Brad is correct, there is no paper that contains this information, as we
just collected the numbers last week.
The method used to collect the numbers was fairly straightforward. We
constructed a kernel module function that looked like the following:
unsigned long long start = 0, stop = 0, total = 0;
unsigned exec_count = 1000;
unsigned i = 0;
for(; i < exec_count; i++) {
rdtscll(start);
#if 1 /* profile traps */
__asm__ __volatile__ ("addb $0, %%al\n" :
/*no inputs*/ :
/*no outputs*/);
#else /* profile branches */
__asm__ __volatile__ ("addl $0x10000000, %%eax\n" :
/*no inputs*/ :
/*no outputs*/);
#endif
rdtscll(stop);
total += stop - start;
}
printk(KERN_INFO "kerninst: benchmark_instrumentation - total cycles
at inst point is %lld for %d executions\n", total, exec_count);
We instrumented the add instruction using kerninst with "null
instrumentation code", which contains a save/restore of all GPRs and
EFLAGS, the relocated add, and a direct jump to the instruction
following the add. Kerninst determines whether to use a branch or trap
from the size of the instruction (the addb is 2bytes, the addl is
5bytes). In the trap case, kerninst supplies its own int3 handler
(similar to the kprobes approach) that hashes the trapping instruction
address to locate the instrumentation code patch in the kernel.
The function was then invoked via an ioctl once a second over a period
of 20 seconds. We grabbed the output from the system log, threw out the
2 highest and 2 lowest totals for 1000 executions, and calculated the
average cycles per execution.
--
Michael J. Brim
Graduate Research Asst.
UW Computer Science Dept.
Rm. 7355, (608)262-6227