LD_DEBUG=statistics shows this:
9506:
9506: runtime linker statistics:
9506: total startup time in dynamic loader: 19960074 cycles
9506: time needed for relocation: 19105814 cycles
(95.7%)
9506: number of relocations: 87
9506: number of relocations from cache: 3
9506: number of relative relocations: 1226
9506: time needed to load objects: 701382 cycles
(3.5%)
9506:
9506: runtime linker statistics:
9506: final number of relocations: 781589
9506: final number of relocations from cache: 3
This is a main program which contains 1,500 function calls. The
functions are defined in a single DSO, and each function calls 520 other
functions, giving a total number of 781,500 relocations from the test.
On my laptop (Intel(R) Core(TM) i7-4810MQ CPU @ 2.80GHz), I get this
(ten runs, real time measured in seconds):
> t.test(prev_laptop, after_laptop)
Welch Two Sample t-test
data: prev_laptop and after_laptop
t = -14.932, df = 18, p-value = 1.392e-11
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.05749145 -0.04330855
sample estimates:
mean of x mean of y
0.2345 0.2849
So it's definitely not in the noise. The penalty appears to be around
65ns per relocation.