This is the mail archive of the binutils@sources.redhat.com mailing list for the binutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Re: gprof is very slow (>100x worse than "before")


Ian Lance Taylor wrote:
> 
> Kevin Nomura <nomura@netapp.com> writes:
> 
> > I coded a workaround which was to precompute a mapping from address
> > to source line, and hook get_src_info to binary-search this mapping.
> > Then "gprof --line" takes less than 5 minutes.  This is still doing
> > a lookup for each byte of the binary as above, so it seems that
> > the generic BFD routine is far too expensive for the brute-force
> > loop.  Certainly core_create_line_syms can be smarter too.  But
> > I don't know the rules of the game with BFD -- what symbol info
> > is exported, what new interfaces are reasonable to add -- these
> > rules might steer the fix in a certain direction.
> 
> gprof for ELF/stabs was sped up quite a bit by some improvements to
> _bfd_stab_section_find_nearest_line.  Perhaps similar improvements
> could be applied to NAME(aout,find_nearest_line).  Perhaps that is
> what you have already done.
> 
> Ian


The speedups in ELF help some, unfortunately the times are still huge.
The original binary I used was a.out, but I built a similar binary
with an x86/ELF compiler and ran the linux-hosted gprof against it.
With default options (function granularity) it took 214 seconds CPU
(versus 4 seconds CPU for the baseline a.out gprof).  WIth "--line",
I cancelled it after 5800 seconds:

[wit]$ time ngprof-elf --line maytag.5 gmon.out.010614x.16060 > junktime2
Command terminated by signal 2
5804.59user 22.03system 2:00:32elapsed 80%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (128major+11165minor)pagefaults 0swaps

This ELF binary is around 9.5MB of text.  The a.out binary had 761,000
"interesting" symbols out of around 3-4 million total symbols, don't know
the numbers for ELF but one might assume they are the same order of
magntude.  The CPU times are inflated by a factor of 2-3 because this
gprof is built without -O2; the host system is a 733MHz x86 linux box.

The algorithm is basically O(n^2) in the binary size so the --line
option might be tolerable for small programs, but it is hardly usable 
for large ones.  Any suggestions on what existing or new BFD interfaces
can be used by gprof to build and cache an address-to-source-line mapping?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]