This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
OProfile examination of where SystemTap spends time
- From: William Cohen <wcohen at redhat dot com>
- To: SystemTAP <systemtap at sources dot redhat dot com>
- Date: Mon, 06 Feb 2006 11:34:23 -0500
- Subject: OProfile examination of where SystemTap spends time
Last Thursday I did a quick experiment on a AMD64 machine running
FC5t2 to see where time is spent when running the testsuite. I have a
script oprofdata-collect.sh that setups up oprofile for the
experiment. Assuming that the processor is running 2GHz, the sampling
to take 2,000 samples per second on the AMD64; the power now throttles
down to 800MHz when the processor is idle.
Below is the overall view of samples for everything with 1% or more of
the samples for a "runtest --tool=systemtap":
$ opreport session:data.5839f --long-filenames -t 1
CPU: AMD64 processors, speed 800 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 1000000
CPU_CLK_UNHALT...|
samples| %|
------------------
175290 54.9835 /usr/libexec/gcc/x86_64-redhat-linux/4.1.0/cc1
CPU_CLK_UNHALT...|
samples| %|
------------------
156557 89.3131 /usr/libexec/gcc/x86_64-redhat-linux/4.1.0/cc1
18701 10.6686 /lib64/libc-2.3.90.so
32 0.0183 /lib64/ld-2.3.90.so
70077 21.9811 /usr/lib/debug/lib/modules/2.6.15-1.1895_FC5/vmlinux
29667 9.3057 /home/wcohen/stap_testing_200602022040/obj/stap
CPU_CLK_UNHALT...|
samples| %|
------------------
14487 48.8320 /home/wcohen/stap_testing_200602022040/obj/stap
7523 25.3581 /usr/lib64/libstdc++.so.6.0.7
4577 15.4279
/home/wcohen/stap_testing_200602022040/obj/lib-elfutils/libdw-0.119.so
2721 9.1718 /lib64/libc-2.3.90.so
203 0.6843 /lib64/ld-2.3.90.so
127 0.4281
/home/wcohen/stap_testing_200602022040/obj/lib-elfutils/libelf-0.119.so
28 0.0944
/home/wcohen/stap_testing_200602022040/obj/lib-elfutils/systemtap/libebl_x86_64-0.119.so
1 0.0034 /lib64/libdl-2.3.90.so
24681 7.7417 /usr/bin/as
CPU_CLK_UNHALT...|
samples| %|
------------------
17890 72.4849 /usr/bin/as
5665 22.9529 /lib64/libc-2.3.90.so
1110 4.4974 /usr/lib64/libbfd-2.16.91.0.5.so
16 0.0648 /lib64/ld-2.3.90.so
5269 1.6527
/usr/src/kernels/2.6.15-1.1895_FC5-x86_64/scripts/mod/modpost
CPU_CLK_UNHALT...|
samples| %|
------------------
2719 51.6037
/usr/src/kernels/2.6.15-1.1895_FC5-x86_64/scripts/mod/modpost
2531 48.0357 /lib64/libc-2.3.90.so
19 0.3606 /lib64/ld-2.3.90.so
A significant amount of time is spent in the traditional compilation
process converting the C code into exectuable, 55% in cc1 (C to
assembly language conversion) and 7.7% in as (assembly to object
code). The kernel is taking about 22%. The systemtap translator is
taking 9.3%.
The systemtap translator doesn't have a huge hotspot as seen
below. The highest thing on the list is the string compare at
5.6%. The elfutils routines to figure out where to place probe points
are next.
$ opreport session:data.5839f --long-filenames -l
/home/wcohen/stap_testing_200602022040/obj/stap -t 2
CPU: AMD64 processors, speed 800 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 1000000
samples % image name symbol name
1660 5.5954 /usr/lib64/libstdc++.so.6.0.7
std::string::compare(char const*) const
1209 4.0752
/home/wcohen/stap_testing_200602022040/obj/lib-elfutils/libdw-0.119.so
lookup
1101 3.7112
/home/wcohen/stap_testing_200602022040/obj/lib-elfutils/libdw-0.119.so
__libdw_find_attr
947 3.1921 /home/wcohen/stap_testing_200602022040/obj/stap
lexer::input_peek(unsigned int)
837 2.8213 /lib64/libc-2.3.90.so _int_malloc
836 2.8179 /home/wcohen/stap_testing_200602022040/obj/stap
lexer::scan()
791 2.6663 /home/wcohen/stap_testing_200602022040/obj/stap
__gnu_cxx::__normal_iterator<int const*, std::vector<int,
std::allocator<int> > >::__normal_iterator(int const* const&)
728 2.4539 /home/wcohen/stap_testing_200602022040/obj/stap
__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int>
> >::__normal_iterator(int* const&)
617 2.0798 /home/wcohen/stap_testing_200602022040/obj/stap
std::vector<int, std::allocator<int>
>::erase(__gnu_cxx::__normal_iterator<int*, std::vector<int,
std::allocator<int> > >)
609 2.0528 /lib64/libc-2.3.90.so memcpy
597 2.0123 /home/wcohen/stap_testing_200602022040/obj/stap
lexer::input_get()
594 2.0022 /home/wcohen/stap_testing_200602022040/obj/stap
std::vector<int, std::allocator<int> >::size() const
The kernel being used has debugging turned on. One can see that the
check_poison_obj at the top of the list with 17.8%. It looks like stap
is exercising the paging mechanism with the number of samples in
clear_page, page_fault, and __handle_mm_fault.
$ opreport session:data.5839f --long-filenames -t 1 -l
/usr/lib/debug/lib/modules/2.6.15-1.1895_FC5/vmlinux
CPU: AMD64 processors, speed 800 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a
unit mask of 0x00 (No unit mask) count 1000000
samples % symbol name
12468 17.7919 check_poison_obj
12458 17.7776 clear_page
2520 3.5960 page_fault
2469 3.5233 __handle_mm_fault
2397 3.4205 number
2260 3.2250 _raw_spin_lock
2091 2.9839 copy_user_generic
1227 1.7509 _raw_spin_unlock
1139 1.6254 memset
1011 1.4427 do_page_fault
977 1.3942 get_page_from_freelist
862 1.2301 strncpy
789 1.1259 vsnprintf