This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug dynamic-link/16709] New: Dynamic linking with large number of DSOs degrades into linear lookup


https://sourceware.org/bugzilla/show_bug.cgi?id=16709

            Bug ID: 16709
           Summary: Dynamic linking with large number of DSOs degrades
                    into linear lookup
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: dynamic-link
          Assignee: unassigned at sourceware dot org
          Reporter: ppluzhnikov at google dot com

Prelude.
--------

Consider a large codebase with lots of subdirectories and unittests:

  dir1/dir1a/dir1aa
  dir1/dir1a/dir1ab
  ...
  dir1/dir1b/dir1ba
  ...
  dirN/dirNa/dirNaa

There could be multiple libraries and unit-test executables in dir1aa,
dir1ab, etc.

There could also be unit-test executables in dir1, which depend on multiple
libraries in dir1a/...

In addition, there could be dependencies on dirN/... libraries from dir1aa
(or elsewhere).

E.g. dir1 could be implementin MapReduce, dir2 could be implementing GFS,
dir3 could be implementing Bigtable, and MapReduce unit-tests could be
using GFS and BigTable as data source or destination.

When building all of these unit-tests from source, it is desirable to
build each individual library as a DSO, so that all tests depending on a
particular library can re-use the same DSO, rather than statically link
in corresponding archive library. (The unit-test binaries are then much
smaller and are much faster to link.)

This process creates unit-test binaries that have 5000+ DSOs in their
direct NEEDED dependency list.

That works, but consider what this do symbol resolution.

Problem:
--------

For a test that only has 112 direct dependencies:

LD_DEBUG=symbols,bindings base/heap-checker_unittest

... first problem: linear search of all libraries for weak unresolved
    symbol coming from main executable:

     15898:     symbol=__gmon_start__;  lookup in
file=base/heap-checker_unittest [0]
     15898:     symbol=__gmon_start__;  lookup in
file=libtesting/base/internal/libinternal.so [0]
     15898:     symbol=__gmon_start__;  lookup in
file=libutil/platforminfo/libplatforminfo.so [0]
     15898:     symbol=__gmon_start__;  lookup in
file=libutil/platforminfo/libcpuinfo.so [0]
     15898:     symbol=__gmon_start__;  lookup in
file=libplatforms/direct/io/libpci.so [0]
... 100 more of these ...
     15898:     symbol=__gmon_start__;  lookup in
file=/usr/grte/v3/lib64/libdl.so.2 [0]
     15898:     symbol=__gmon_start__;  lookup in
file=/usr/grte/v3/lib64/libpthread.so.0 [0]
     15898:     symbol=__gmon_start__;  lookup in
file=/usr/grte/v3/lib64/librt.so.1 [0]
     15898:     symbol=__gmon_start__;  lookup in
file=/usr/grte/v3/lib64/libm.so.6 [0]
     15898:     symbol=__gmon_start__;  lookup in
file=/usr/grte/v3/lib64/libc.so.6 [0]
     15898:     symbol=__gmon_start__;  lookup in
file=/usr/grte/v3/lib64/ld-linux-x86-64.so.2 [0]

this entire lookup chain is then repeated 112 more times because every
DSO has a weak unresolved dependency on __gmon_start__, which is never
defined anywhere.

This is a quadratic behavior on the number of DSOs linked into the process.

... second problem: libc symbols are defined at the tail end of the chain:

     15898:     symbol=pthread_create;  lookup in file=heap-checker_unittest
[0]
     15898:     symbol=pthread_create;  lookup in
file=libtesting/base/internal/libinternal.so [0]
     15898:     symbol=pthread_create;  lookup in
file=libutil/platforminfo/libplatforminfo.so [0]
... 100 more of these ...
     15898:     symbol=pthread_create;  lookup in
file=/usr/grte/v3/lib64/libdl.so.2 [0]
     15898:     symbol=pthread_create;  lookup in
file=/usr/grte/v3/lib64/libpthread.so.0 [0]
     15898:     binding file libthread/libthread.so [0] to
/usr/grte/v3/lib64/libpthread.so.0 [0]: normal symbol `pthread_create'
[GLIBC_2.2.5]

... this lookup chain repeats for every DSO that calls pthread_create.
    For this particular testcase, only twice, but generally this is quadratic
    again if every DSO called pthread_create.


Possible solution:
------------------

For over 5 years now, Google has been running with a local patch, which
allows glibc to bypass these linear scans, by setting a bloom filter
on symbols.

This patch was first mentioned here:
  https://www.sourceware.org/ml/libc-alpha/2006-01/msg00018.html
and again more recently:
  https://www.sourceware.org/ml/libc-help/2013-02/msg00040.html

Here is how much the patch improves things (this unit-test has 6400+ DSOs:

without the patch:

  274.31user 6.31system 4:47.55elapsed 97%CPU (0avgtext+0avgdata
4163712maxresident)k

  LD_DEBUG=statistics lookup_stats_test --version
     22506:
     22506:     runtime linker statistics:
     22506:       total startup time in dynamic loader: 604460360804 clock
cycles
     22506:                 time needed for relocation: 560364579938 clock
cycles (92.7%)
     22506:                      number of relocations: 857458
     22506:           number of relocations from cache: 372137
     22506:             number of relative relocations: 220299
     22506:                time needed to load objects: 43959679372 clock
cycles (7.2%)

with the patch:

  13.18user 6.11system 0:25.74elapsed 74%CPU (0avgtext+0avgdata
5128912maxresident)k

     20724:     runtime linker statistics:
     20724:       total startup time in dynamic loader: 49552390029 clock
cycles
     20724:                 time needed for relocation: 2753565219 clock cycles
(5.5%)
     20724:                      number of relocations: 857458
     20724:           number of relocations from cache: 372137
     20724:             number of relative relocations: 220299
     20724:                time needed to load objects: 46738240677 clock
cycles (94.3%)


Note the drastic reduction in wall time and relocation time.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]