This is the mail archive of the libc-hacker@sources.redhat.com mailing list for the glibc project.

Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

more optimizations


I just checked in a rather large patch which removes a lot of
relocations.  This is before the patch:

  libc.so: 1609 relocations, 1317 relative, 502 PLT entries

This is afterwards:

  libc.so: 1609 relocations, 1484 relative, 413 PLT entries

Executing a null program with ld.so statistics enabled shows the
following.  Before:

18957:  runtime linker statistics:
18957:    total startup time in dynamic loader: 1215464 clock cycles
18957:              time needed for relocation: 546656 clock cycles (44.9%)
18957:                   number of relocations: 200
18957:        number of relocations from cache: 114
18957:             time needed to load objects: 379052 clock cycles (31.1%)
18957:
18957:  runtime linker statistics:
18957:             final number of relocations: 206
18957:  final number of relocations from cache: 114

Now:

18989:  runtime linker statistics:
18989:    total startup time in dynamic loader: 1178256 clock cycles
18989:              time needed for relocation: 467668 clock cycles (39.6%)
18989:                   number of relocations: 140
18989:        number of relocations from cache: 7
18989:             time needed to load objects: 379492 clock cycles (32.2%)
18989:
18989:  runtime linker statistics:
18989:             final number of relocations: 146
18989:  final number of relocations from cache: 7



This means:

 - 89 functions defined in libc were also called by names which are
   exported resulting in PLT entries.  Avoiding this not only gets
   rid of the JUMP_SLOT relocations (transformating them to relative
   relocations), it also allows to generate better code.

 - 177 non-JUMP_SLOT relocations were converted to relative relocation
   (partly overlapping with the PLT optimization)


The performance improvements in ld.so are measurable.  Timing the null
program shows before (these are the cycles reported by ld.so):

minimum: total=1369336, relocs=536704, load=325680
average: total=1402050, relocs=544385, load=335134

Now;

minimum: total=1259892, relocs=440732, load=314832
average: total=1292682, relocs=451006, load=326384


I.e., ld.so spends about 100000 cycles less on relocations.  This is
directly visible in startup time improvements.  Before:

minimum: 0.001447713 sec
average: 0.001488027 sec

Now:

minimum: 0.001389521 sec
average: 0.001425523 sec


If you do the math you'll see that my machine runs at 1.7GHz.  The
time improvements are not that impressive but it's a fast machine and
there is more to come and the percentage gains are impressive (about
8% overall, 12% if you exclude the time the kernel is loading some
files).  And we are not through yet.

So so I've concentrated on libio and RPC, both fairly closed sets of
code and the files are not used individually.  There are still 413
PLTs in use.  Only a few (e.g., for the thread functions) are really
needed.  I few others, like malloc, will be kept for interposition.
All the rest can go.  This means speedup and size reduction (which by
the way was about 2k so far).

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]