This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: stptracer-20060828 has released.


On Thu, 2006-08-31 at 16:30 +0800, Li Guanglei wrote:

> The first thing I want to figure out about STPTracer is how it
> performs compared with LKET. STPTracer uses an interface named gBTI
> which could only print fix number integers into a pre-reserved relayfs
> buffer, while LKET uses the _stp_printf with binary print support.
> handlers, i.e.,

>        probe syscall.getsid {}
> cpu 0: loops=5000000, average=1523 ns
> 
> <3> probe using lket_getsid.stp:
> cpu 0: loops=5000000, average=3079 ns
> 
> <4> probe using lkst_getsid.stp:
> cpu 0: loops=5000000, average=2341 ns
> 

LKET overhead = 3079 - 1523 = 1556 ns
LKST overhead = 818 ns
difference is 738 ns

> Here you can see that gBTI is around 23% faster than _stp_printf(). 

I'd say LKET is almost twice as fast on your machine, factoring out
probe overhead.

Looking at the source code, I can't say anything about stp_printf() vs
gbti.  Certainly stp_printf cannot be taking 738 ns to simply print 4
integers.

> In
> fact a long time ago I used oprofile to find out where _stp_printf()
> cost cpu time, and it showed that a lot time is spent on _stp_vsnprintf().

Yeah, stp_printf() is just a wrapper to call stp_vsnprintf(). 

When I proposed and implemented stp_printf, I did a lot of benchmarks.
So I reran them with the current code vs a very simple, fast gbti-type
interface without the headers and timestamping.

function stp_binary_write(n:long, arg1:long, arg2:long, arg3:long, arg4:long) %{
        _stp_binary_write (THIS->n, THIS->arg1, THIS->arg2, THIS->arg3, THIS->arg4);
%}
static void _stp_binary_write (int num, ...)
{
        va_list vargs;
        int i;
        int64_t *args;

        if (unlikely(num > MAXBINARYARGS))
                num = MAXBINARYARGS;

        args = _stp_reserve_bytes(num * sizeof(int64_t));

        if (args != NULL) {
                va_start(vargs, num);
                for (i = 0; i < num; i++) {
                        args[i] = va_arg(vargs, int64_t);
                }
                va_end(vargs);
        }
}

So here are the results of my 2.6GHz P4.

SystemTap BENCH2        Tue Sep  5 10:52:37 EDT 2006
kernel: 2.6.17-1.2174_FC5smp i686
Fedora Core release 5 (Bordeaux)
dragon:  10:52:37 up 12 min,  3 users,  load average: 0.71, 0.54, 0.38
processors: 2 (1 physical) Intel(R) Pentium(R) 4 CPU 2.60GHz
MemTotal:      2074472 kB       MemFree:       1303368 kB
----------------------------------------------------------------
For comparison, function call overhead is 471 nsecs.
Times below are nanoseconds per probe and include kprobe overhead.
----------------------------------------------------------------
+--- S = Script, R = Runtime
|+-- * = Relayfs                Threads
|| NAME                             1       2
S : empty probe                   601     506
S : binary printf 4 integers (%8b)       1073     899
S : binary printf 4 integers (%4b)        997     767
S : binary printf 4 integers (%b)         953     745
S : _stp_binary_write 4 integers          988     779

So you can see a slight advantage for _stp_binary_write when you are writing
64-bit values. When the sizes are smaller, stp_printf has a slight advantage.

Martin







Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]