This is the mail archive of the libc-hacker@sources.redhat.com mailing list for the glibc project.

Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

more libunwind startup-overhead tuning


>>>>> On Sat, 20 Dec 2003 23:32:19 -0800, David Mosberger <davidm@linux.hpl.hp.com> said:

  David> The dynamic relocation count is now down from 747 to 142 (50
  David> of them are NONE relocs).

  David> I'm sure there is more tuning that can be done to minimize
  David> load-time overhead, but i'll look into those after finishing
  David> the DWARF unwinder.

I figured a way to split the local-only unwinder into a separate
library in a way that won't create API/ABI-incompatibilities (except
for rather esoteric corner-cases, which won't affect GCC, GDB, or
other major libunwind-users).  With a separate local-only
libunwind.so, the dynamic relocation count shrinks to 72 (32 of which
are NONE relocs).

If I use LD_DEBUG=statistics, I get the following dynamic reloc counts
("final number of relocations"):

 no-op program without libunwind:			 90
 no-op program with libunwind v0.96:			112
 no-op program with separate, local-only libunwind:	 93

To measure actual execution-time impact, I created a no-op program
"empty" whose main() function returns immediately.  Then I created a
statically-linked "forker" program which spawns "empty" 10000 times.
I used LD_PRELOAD to add a dependency on libunwind as desired.  The
results are below (numbers are execution time in seconds, as reported
by "time"):

							real  user  system

 no-op program without libunwind:		 	7.347 2.401 4.940
 no-op program with libunwind v0.96:		 	8.253 2.858 5.345
 no-op program with separate, local-only libunwind: 	7.878 2.627 5.250

So, with the local-only version of libunwind, the pretty much absolute
worst case overhead of always linking dynamically against libunwind
seems to be about 7%.  Remember: this is a worst-case which applies
only for shared objects which do not link against anything other than
ld.so and libc.so.  In my opinion, this is a reasonably small overhead
(if you really want minimal startup-times for such tiny programs,
static linking will give much better results anyhow).

For completeness, I attached the profile for the "no libunwind" and
the "local-only libunwind" cases below.  The caveat for the profiles
is that they cover all 10,000 invocations of "empty" and that the
call-counts where obtained via sampling, so they're not 100% accurate.
Even so, you can see that the call counts are sensible.  For example,
in the no-libunwind-case, _dl_relocate_object() gets called about 3
times per "empty" invocation (main program, ld.so, libc, I think) and
about 4 times for the libunwind-case.

I think the only way to essentially eliminate the overhead alltogether
would be to use the analogous scheme to the one used in libpthread.
That is, provide stub-versions of _Unwind_*() which, when invoked,
will dlopen() libunwind.so and re-direct the calls to the appropriate
entry-points in libunwind.so.  However, to avoid a dependency against
-ldl (which would defeat the entire purpose of the stubs), libgcc
would have to use __libc_dlopen_mode(), which is probably undesirable.

Comments/feedback welcome.


	--david

Profile without libunwind.so:

 Each histogram sample counts as 533.125u seconds
% time      self     cumul     calls self/call  tot/call name
 35.38      7.94      7.94      322k     24.7u     25.7u _dl_relocate_object
 16.46      3.69     11.64     66.3M     55.7n     81.6n _dl_make_fptr
 10.89      2.45     14.08     8.99M      272n      467n do_lookup_versioned
  7.57      1.70     15.78     41.2M     41.2n     41.2n make_fdesc
  4.80      1.08     16.86     42.5M     25.4n     25.4n ld-2.3.2.so:strcmp
  4.25      0.95     17.81     21.8M     43.8n     43.8n ld-2.3.2.so:__umoddi3
  3.20      0.72     18.53     9.89M     72.8n     72.8n ld-2.3.2.so:_dl_elf_hash
  2.55      0.57     19.11      105k     5.44u     97.6u ld-2.3.2.so:_dl_start
  2.06      0.46     19.57     8.99M     51.5n      592n _dl_lookup_versioned_symbol
  1.25      0.28     19.85      870k      322n      322n do_lookup
  1.19      0.27     20.12      195k     1.37u     2.16u _dl_map_object_from_fd
  0.87      0.20     20.31      103k     1.90u     91.6u dl_main

Profile when pre-loading separate, local-only libunwind.so:

% time      self     cumul     calls self/call  tot/call name
 32.61      8.21      8.21      445k     18.5u     19.4u _dl_relocate_object
 14.91      3.76     11.97     67.6M     55.6n     81.6n _dl_make_fptr
 13.01      3.28     15.25     9.39M      349n      596n do_lookup_versioned
  6.86      1.73     16.97     42.1M     41.0n     41.0n make_fdesc
  5.72      1.44     18.41     32.9M     43.7n     43.7n ld-2.3.2.so:__umoddi3
  5.06      1.27     19.69     56.5M     22.5n     22.5n ld-2.3.2.so:strcmp
  3.24      0.81     20.50     10.4M     78.7n     78.7n ld-2.3.2.so:_dl_elf_hash
  2.30      0.58     21.08     99.5k     5.83u      111u ld-2.3.2.so:_dl_start
  2.00      0.50     21.59     9.43M     53.4n      725n _dl_lookup_versioned_symbol
  1.43      0.36     21.95      296k     1.22u     2.05u _dl_map_object_from_fd
  1.41      0.35     22.30      879k      404n      404n do_lookup
  0.87      0.22     22.52     88.0k     2.47u      116u dl_main


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]