This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Where stap spends its time producing instrumentation for testsuite/systemtap.syscall/sys.stp


Hi Frank,

Thanks for the improvements. I reran the same experiment from
Wednesday on sys.stp with a fresh cvs checkout of systemtap. There are
improvements in the pass 2 and pass 4 times. Pass 2 dropped form 39.7
seconds to 29.7 seconds user time. Pass 4 times dropped from 34.1
seconds to 26.4 seconds. Definitely, a trend in ht ecorrect direction.
The __libdw_find_attr and __libdw_form_val_len functions still
dominate the runtimes. The elfutils-dwfl caching is going to improve
that situation?

-Will



$ sudo opcontrol --start; /home/wcohen/systemtap_write/install/bin/stap -p4 -v /home/wcohen/systemtap_write/src/testsuite/systemtap.syscall/sys.stp -m xyz -k; sudo opcontrol --shutdown
Using 2.6+ OProfile kernel interface.
Reading module info.
Using log file /var/lib/oprofile/samples/oprofiled.log
Daemon started.
Profiler running.
Warning: using '-m' disables cache support.
Pass 1: parsed user script and 37 library script(s) in 360usr/40sys/423real ms.
Pass 2: analyzed script: 768 probe(s), 108 function(s), 14 embed(s), 4 global(s) in 29740usr/1590sys/38374real ms.
Pass 3: translated to C into "/tmp/stapJKUmBF/xyz.c" in 150usr/110sys/256real ms.


Pass 4: compiled C into "xyz.ko" in 26350usr/1710sys/30719real ms.
Keeping temporary directory "/tmp/stapJKUmBF"
Stopping profiling.
Killing daemon.

The size of generated code also dropped significantly from 120K lines
to 91K lines of code. The resulting module is roughly half the size
for the text region.

$ wc /tmp/stapJKUmBF/xyz.c
  91259  227560 2586615 /tmp/stapJKUmBF/xyz.c
$ size /tmp/stapJKUmBF/xyz.ko
   text    data     bss     dec     hex filename
 529327  169408    4816  703551   abc3f /tmp/stapJKUmBF/xyz.ko


The number of samples for stap dropped quite a bit. Most of these appear to be areduction in the stap code itself. For the component breakdown of stap, the stap code was about 32% of the total samples for stap. It is now around 11% fo the samples. The compiler cc1 does less work also.


$ opreport -t1
CPU: AMD64 processors, speed 800 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 1000000
Counted RETIRED_INSTRUCTIONS events (Retired instructions (includes exceptions, interrupts, re-syncs)) with a unit mask of 0x00 (No unit mask) count 1000000
CPU_CLK_UNHALT...|RETIRED_INSTRU...|
samples| %| samples| %|
------------------------------------
56465 48.0868 64383 53.5976 stap
CPU_CLK_UNHALT...|RETIRED_INSTRU...|
samples| %| samples| %|
------------------------------------
32131 56.9043 37883 58.8401 libdw.so.1
11497 20.3613 12154 18.8777 libc-2.6.90.so
6431 11.3894 5939 9.2245 stap
5233 9.2677 6436 9.9964 libstdc++.so.6.0.8
1129 1.9995 1961 3.0458 libelf.so.1
32 0.0567 6 0.0093 ld-2.6.90.so
9 0.0159 4 0.0062 libpthread-2.6.90.so
2 0.0035 0 0 libebl_x86_64.so
1 0.0018 0 0 libdl-2.6.90.so
48520 41.3207 47267 39.3488 cc1
CPU_CLK_UNHALT...|RETIRED_INSTRU...|
samples| %| samples| %|
------------------------------------
45011 92.7679 43950 92.9824 cc1
3502 7.2176 3313 7.0091 libc-2.6.90.so
7 0.0144 4 0.0085 ld-2.6.90.so
9456 8.0529 6286 5.2330 vmlinux
1869 1.5917 1635 1.3611 as
CPU_CLK_UNHALT...|RETIRED_INSTRU...|
samples| %| samples| %|
------------------------------------
1325 70.8935 1062 64.9541 as
501 26.8058 539 32.9664 libc-2.6.90.so
34 1.8192 31 1.8960 libbfd-2.17.50.0.18-1.so
8 0.4280 3 0.1835 ld-2.6.90.so
1 0.0535 0 0 libopcodes-2.17.50.0.18-1.so



The two hotspot functions still exist. Due to the reduction in other parts of of stap now have a larger portion of the samples.



$ opreport -l -t1 /home/wcohen/systemtap_write/install/bin/stap
CPU: AMD64 processors, speed 800 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 1000000
Counted RETIRED_INSTRUCTIONS events (Retired instructions (includes exceptions, interrupts, re-syncs)) with a unit mask of 0x00 (No unit mask) count 1000000
samples % samples % image name symbol name
13316 23.5828 13444 20.8813 libdw.so.1 __libdw_find_attr
12501 22.1394 17166 26.6623 libdw.so.1 __libdw_form_val_len
4770 8.4477 3806 5.9115 libc-2.6.90.so strlen
1773 3.1400 2930 4.5509 libdw.so.1 dwfl_module_getsym
1581 2.8000 1722 2.6746 stap query_dwarf_func(Dwarf_Die*, void*)
1413 2.5024 1350 2.0968 libdw.so.1 dwarf_attr
1361 2.4103 2306 3.5817 libstdc++.so.6.0.8 std::string::assign(std::string const&)
1154 2.0437 247 0.3836 stap std::_Rb_tree<std::string, std::pair<std::string const, std::vector<Dwarf_Die, std::allocator<Dwarf_Die> >*>, std::_Select1st<std::pair<std::string const, std::vector<Dwarf_Die, std::allocator<Dwarf_Die> >*> >, std::less<std::string>, std::allocator<std::pair<std::string const, std::vector<Dwarf_Die, std::allocator<Dwarf_Die> >*> > >::lower_bound(std::string const&)
1143 2.0243 1640 2.5473 libc-2.6.90.so memcpy
1106 1.9587 1957 3.0396 libelf.so.1 gelf_getsymshndx
1039 1.8401 977 1.5175 libc-2.6.90.so _int_free
1038 1.8383 1201 1.8654 stap query_cu(Dwarf_Die*, void*)
1027 1.8188 1160 1.8017 libc-2.6.90.so _int_malloc
974 1.7250 1117 1.7349 libc-2.6.90.so internal_fnmatch
733 1.2981 1118 1.7365 libc-2.6.90.so strcmp
718 1.2716 166 0.2578 stap symresolution_info::find_function(std::string const&, unsigned int)
683 1.2096 1195 1.8561 stap lookup_symbol_address(Dwfl_Module*, char const*)
644 1.1405 532 0.8263 libc-2.6.90.so free
644 1.1405 581 0.9024 libdw.so.1 dwarf_func_inline
612 1.0839 1145 1.7784 libc-2.6.90.so malloc
598 1.0591 370 0.5747 libdw.so.1 lookup



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]