This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH 0/3] Performance tweaks for libdw


I did some investigation into libdw performance hotspots, and came up
with a few tweaks that in total trim nearly 1/3 the running time.  I'm
running on Mark's recent empty-location patch, and I primarily used
"tests/varlocs -k >/dev/null" as a moderately long-running benchmark.
I'm using gcc-4.8.2-1.fc20.x86_64 and kernel-3.11.9-300.fc20.x86_64,
running on an i7-2600.

The perf profile initially looked like this:

Samples: 343K of event 'cycles', Event count (approx.): 318932712301
 33.69%  varlocs  libdw.so            [.] __libdw_find_attr
 16.47%  varlocs  libdw.so            [.] lookup.isra.0
 15.63%  varlocs  libdw.so            [.] __libdw_form_val_len
 13.11%  varlocs  libdw.so            [.] dwarf_siblingof
  4.84%  varlocs  libdw.so            [.] dwarf_tag
  4.62%  varlocs  libdw.so            [.] walk_children.4364
  2.35%  varlocs  libdw.so            [.] __libdw_findabbrev
  2.32%  varlocs  libdw.so            [.] Dwarf_Abbrev_Hash_find
  1.26%  varlocs  libdw.so            [.] dwarf_child

Patch 1 addresses form_val_len with an inlined fast path for forms with
constant length.  Patch 2 is a rework of get_uleb128 and get_sleb128,
which are significant in find_attr and elsewhere.  Patch 3 addresses the
hash lookup which is called often to find DIE abbreviations.

The perf profile now looks like this:

Samples: 229K of event 'cycles', Event count (approx.): 213925592727
 44.63%  varlocs  libdw.so            [.] __libdw_find_attr
 22.28%  varlocs  libdw.so            [.] dwarf_siblingof
  7.64%  varlocs  libdw.so            [.] walk_children.4388
  7.07%  varlocs  libdw.so            [.] dwarf_tag
  5.18%  varlocs  libdw.so            [.] __libdw_findabbrev
  2.88%  varlocs  libdw.so            [.] __libdw_form_val_compute_len
  2.11%  varlocs  libdw.so            [.] dwarf_child
  1.44%  varlocs  libdw.so            [.] __libdw_formref
  1.12%  varlocs  libdw.so            [.] scope_visitor

The remaining busy work is simply walking through attributes, from DIE
to DIE.  I believe optimizing this further will be hard without keeping
track of DIE lengths somewhere, which is a lot to cache.  Putting the
length in Dwarf_Die itself is not feasible, because those are short-
lived and frequently recreated.

Here's some summary information for how these patches change varloc -k:

             libdw  varlocs  varlocs
              text     time   maxres
    Base:   243072   84.42s  242360k
    P1:     243296   74.91s  242356k
    P2:     243184   70.61s  242360k
    P3:     243600   56.75s  243588k

My timings are not statistically rigorous measurements, but it still
seems a clear win across the board.  Other benchmarks I've tried, like
tests/allfcts and stap -l syscall.*, show similar improvement.

Feedback is always appreciated.

Thanks,
Josh

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]