Bug 1931 - translator doing too much dwarf searching for kernel.syscall.*
Summary: translator doing too much dwarf searching for kernel.syscall.*
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: translator (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Graydon Hoare
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-11-28 01:35 UTC by Frank Ch. Eigler
Modified: 2005-12-14 02:28 UTC (History)
0 users

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Ch. Eigler 2005-11-28 01:35:46 UTC
A profiled build of systemtap, running "stap -p2 -e 'probe kernel.syscall.* {}'"
gives some hints as to why that translation stage takes on the order of a minute
on a 3GHz machine.  It looks like query_dwarf_func/etc. are called a ridiculous
number of times, considering that those syscall aliases expand to a mere 200-odd
kernel.function()s.  Here are some gprof highlights.

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 27.30      4.54     4.54                            
query_dwarf_func(Dwarf_Die*, void*)
 13.65      6.81     2.27  6274460     0.00     0.00 
dwflpp::focus_on_function(Dwarf_Die*)
 13.35      9.03     2.22 12548920     0.00     0.00  dwflpp::func_is_inline()
  7.82     10.33     1.30  6524382     0.00     0.00  dwflpp::default_name(char
const*, char const*)  4.63     11.10     0.77  3421068     0.00     0.00 
dwflpp::function_name_matches(std::basic_string<char, std::char_traits<char>,
std::allocator<char> >)
  2.89     11.58     0.48   231852     0.00     0.02  query_cu(Dwarf_Die*, void*)

....

index % time    self  children    called     name
                                                 <spontaneous>
[1]     66.8    4.54    6.57                 query_dwarf_func(Dwarf_Die*, void*) [1]
                2.27    1.25 6274460/6274460    
dwflpp::focus_on_function(Dwarf_Die*) [5]
                2.22    0.00 12548920/12548920     dwflpp::func_is_inline() [6]
                0.77    0.00 3421068/3421068    
dwflpp::function_name_matches(std::basic_string<char, std::char_traits<char>,
std::allocator<char> >) [13]
                0.01    0.04     275/275         std::map<unsigned long long,
func_info, std::less<unsigned long long>, std::allocator<std::pair<unsigned long
long const, func_info> > >::operator[](unsigned long long const&) [90]
                0.01    0.00     275/275         dwflpp::function_file(char
const**) [278]
                0.00    0.00     275/275        
dwflpp::function_entrypc(unsigned long long*) [1068]
                0.00    0.00     275/550         func_info::func_info() [978]
                0.00    0.00     275/275        
dwflpp::function_die(Dwarf_Die*) [1066]
                0.00    0.00     275/275         dwflpp::function_line(int*) [1067]
                0.00    0.00     275/275         func_info::operator=(func_info
const&) [1074]
                0.00    0.00     275/1100        func_info::~func_info() [855]
-----------------------------------------------
                                                 <spontaneous>
[2]     25.6    0.02    4.24                 query_module(Dwfl_Module*, void**,
char const*, unsigned long long, void*) [2]
                0.18    4.01     278/278         dwflpp::iterate_over_cus(int
(*)(Dwarf_Die*, void*), void*) [3]
                0.04    0.00   18070/18070      
dwflpp::focus_on_module(Dwfl_Module*) [108]
                0.01    0.00   18348/18348      
dwflpp::module_name_matches(std::basic_string<char, std::char_traits<char>,
std::allocator<char> >) [322]
-----------------------------------------------
                0.18    4.01     278/278         query_module(Dwfl_Module*,
void**, char const*, unsigned long long, void*) [2]
[3]     25.2    0.18    4.01     278         dwflpp::iterate_over_cus(int
(*)(Dwarf_Die*, void*), void*) [3]
                0.48    3.52  231852/231852      query_cu(Dwarf_Die*, void*) [4]
                0.01    0.00     278/278         dwflpp::get_module_dwarf(bool)
[272]
Comment 1 Graydon Hoare 2005-12-07 23:25:39 UTC
This arises due to the fact that we match each function pattern in a probe
header independently against the dwarf files. Thus for the 200-odd
kernel.function("foo") patterns which result from the glob and alias expansion,
we do 200-odd passes over the dwarf data. 

A plausible solution would be to change the pattern-match code to take a set of
globs, rather than a single glob, and to amalgamate all (or some
suitably-similar subset of) the dwarf probe-points in a probe header into a
single pass over the dwarf data.
Comment 2 Frank Ch. Eigler 2005-12-07 23:53:32 UTC
A somewhat similar problem used to occur with respect to prologue analysis: all
CUs were searched for line records, even if no functions were actually selected
for them.  A one-line conditional (tapsets.cxx:2181 in v1.95) made a huge
difference in eliminating wasted effort.

Might a similar oversight be responsible for the excessive searching now? 
Perhaps some results may be cached from earlier searches?
Comment 3 Graydon Hoare 2005-12-10 05:55:43 UTC
I have committed a change which makes the dwflpp structures persist between
pattern elaboration calls, adds an indexed CU and function DIE cache to the
dwflpp, and reimplements the CU and function DIE-iteration loops via these
indices (multimap::lower_bound on pattern-stems with fnmatch() filtering on the
remaining set). kernel.syscall.* reduced from ~1.5m to ~2s, on my workstation.
Let me know if you find further problems.
Comment 4 Frank Ch. Eigler 2005-12-12 15:28:02 UTC
Wow, nicely done.
Comment 5 Frank Ch. Eigler 2005-12-12 16:46:58 UTC
Not a big deal, but the new code appears to work differently with respect to
semok/twenty.stp.  On kernels that have debuginfo-less modules loaded (e.g.
vmware), this test case used to pass, I suppose skipping over them.  Now, it
results in an overall failure.  It would be nice to restore the old behavior if
that was not too hard.  (It would work like an implicit blacklist over
debuginfo-less modules.)
Comment 6 Graydon Hoare 2005-12-14 02:28:12 UTC
really? I see failures both ways. what sort of messages do you get when it
"succeeds"?