This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

OProfile examination of where SystemTap spends time


Last Thursday I did a quick experiment on a AMD64 machine running
FC5t2 to see where time is spent when running the testsuite. I have a
script oprofdata-collect.sh that setups up oprofile for the
experiment. Assuming that the processor is running 2GHz, the sampling
to take 2,000 samples per second on the AMD64; the power now throttles
down to 800MHz when the processor is idle.

Below is the overall view of samples for everything with 1% or more of
the samples for a "runtest --tool=systemtap":


$ opreport session:data.5839f --long-filenames -t 1
CPU: AMD64 processors, speed 800 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 1000000
CPU_CLK_UNHALT...|
samples| %|
------------------
175290 54.9835 /usr/libexec/gcc/x86_64-redhat-linux/4.1.0/cc1
CPU_CLK_UNHALT...|
samples| %|
------------------
156557 89.3131 /usr/libexec/gcc/x86_64-redhat-linux/4.1.0/cc1
18701 10.6686 /lib64/libc-2.3.90.so
32 0.0183 /lib64/ld-2.3.90.so
70077 21.9811 /usr/lib/debug/lib/modules/2.6.15-1.1895_FC5/vmlinux
29667 9.3057 /home/wcohen/stap_testing_200602022040/obj/stap
CPU_CLK_UNHALT...|
samples| %|
------------------
14487 48.8320 /home/wcohen/stap_testing_200602022040/obj/stap
7523 25.3581 /usr/lib64/libstdc++.so.6.0.7
4577 15.4279 /home/wcohen/stap_testing_200602022040/obj/lib-elfutils/libdw-0.119.so
2721 9.1718 /lib64/libc-2.3.90.so
203 0.6843 /lib64/ld-2.3.90.so
127 0.4281 /home/wcohen/stap_testing_200602022040/obj/lib-elfutils/libelf-0.119.so
28 0.0944 /home/wcohen/stap_testing_200602022040/obj/lib-elfutils/systemtap/libebl_x86_64-0.119.so
1 0.0034 /lib64/libdl-2.3.90.so
24681 7.7417 /usr/bin/as
CPU_CLK_UNHALT...|
samples| %|
------------------
17890 72.4849 /usr/bin/as
5665 22.9529 /lib64/libc-2.3.90.so
1110 4.4974 /usr/lib64/libbfd-2.16.91.0.5.so
16 0.0648 /lib64/ld-2.3.90.so
5269 1.6527 /usr/src/kernels/2.6.15-1.1895_FC5-x86_64/scripts/mod/modpost
CPU_CLK_UNHALT...|
samples| %|
------------------
2719 51.6037 /usr/src/kernels/2.6.15-1.1895_FC5-x86_64/scripts/mod/modpost
2531 48.0357 /lib64/libc-2.3.90.so
19 0.3606 /lib64/ld-2.3.90.so



A significant amount of time is spent in the traditional compilation process converting the C code into exectuable, 55% in cc1 (C to assembly language conversion) and 7.7% in as (assembly to object code). The kernel is taking about 22%. The systemtap translator is taking 9.3%.


The systemtap translator doesn't have a huge hotspot as seen below. The highest thing on the list is the string compare at 5.6%. The elfutils routines to figure out where to place probe points are next.

$ opreport session:data.5839f --long-filenames -l /home/wcohen/stap_testing_200602022040/obj/stap -t 2
CPU: AMD64 processors, speed 800 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 1000000
samples % image name symbol name
1660 5.5954 /usr/lib64/libstdc++.so.6.0.7 std::string::compare(char const*) const
1209 4.0752 /home/wcohen/stap_testing_200602022040/obj/lib-elfutils/libdw-0.119.so lookup
1101 3.7112 /home/wcohen/stap_testing_200602022040/obj/lib-elfutils/libdw-0.119.so __libdw_find_attr
947 3.1921 /home/wcohen/stap_testing_200602022040/obj/stap lexer::input_peek(unsigned int)
837 2.8213 /lib64/libc-2.3.90.so _int_malloc
836 2.8179 /home/wcohen/stap_testing_200602022040/obj/stap lexer::scan()
791 2.6663 /home/wcohen/stap_testing_200602022040/obj/stap __gnu_cxx::__normal_iterator<int const*, std::vector<int, std::allocator<int> > >::__normal_iterator(int const* const&)
728 2.4539 /home/wcohen/stap_testing_200602022040/obj/stap __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::__normal_iterator(int* const&)
617 2.0798 /home/wcohen/stap_testing_200602022040/obj/stap std::vector<int, std::allocator<int> >::erase(__gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >)
609 2.0528 /lib64/libc-2.3.90.so memcpy
597 2.0123 /home/wcohen/stap_testing_200602022040/obj/stap lexer::input_get()
594 2.0022 /home/wcohen/stap_testing_200602022040/obj/stap std::vector<int, std::allocator<int> >::size() const



The kernel being used has debugging turned on. One can see that the check_poison_obj at the top of the list with 17.8%. It looks like stap is exercising the paging mechanism with the number of samples in clear_page, page_fault, and __handle_mm_fault.

$ opreport session:data.5839f --long-filenames -t 1 -l /usr/lib/debug/lib/modules/2.6.15-1.1895_FC5/vmlinux
CPU: AMD64 processors, speed 800 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 1000000
samples % symbol name
12468 17.7919 check_poison_obj
12458 17.7776 clear_page
2520 3.5960 page_fault
2469 3.5233 __handle_mm_fault
2397 3.4205 number
2260 3.2250 _raw_spin_lock
2091 2.9839 copy_user_generic
1227 1.7509 _raw_spin_unlock
1139 1.6254 memset
1011 1.4427 do_page_fault
977 1.3942 get_page_from_freelist
862 1.2301 strncpy
789 1.1259 vsnprintf



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]