This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Re: Prelinking of shared libraries
On Saturday 05 May 2001 06:29, Martin v. Loewis wrote:
> > You forget about the ones in the .data section (due to vtables) In
> > libkdeui.so.3.0.0 (Latest KDE CVS) I have 15547 relocations of the type
> > R_386_32 and 12759 of those refer to qt symbols. Most of those seem to
> > come out of the .data section.
>
> [...]
>
> > No. What currently happens is that the .text and .plt of a lib are shared
> > between processes but that each process has its own .got and .data.
>
> So it appears that you claim that "most" overhead comes from the loss
> of sharing, and the need to page-in (or copy-on-write) the data
> sections, and the got.
Let me make some points clear first:
1) How well the linker performs depends very much on what you use it for. So
it's impossible to say something generic as "most overhead comes from xyz",
it depends entirely on the situation. I will identify a few situations and
then show what kind of problems occur in the situation. (See below)
2) There are two issues, speed and memory usage. Some problems relate to
speed, some to memory and some to both.
The approach that I described (basically the Nelson paper) solves everything
because it basically means that you don't need to link/relocate your
libraries at all anymore, so any problem associated with it, disappears as
well. I guess that you now would like to know whether that is indeed needed,
or whether a less far-reaching solution would suffice as well.
> If this is indeed your claim, can you give some proof to support it?
> E.g. how many pages need to be copied? In kdeui.so.2, .got has just 3
> pages of memory (.data has 5 pages) ...
I'll try to give an overview of the various aspects:
1) Exception handling.
Exception handling causes a lot of relocations of type R_386_RELATIVE. These
relocations can be done relatively fast but they do cause memory usage to
increase. For an average KDE application this amounted to about 800Kb when i
checked this about a year ago. I don't have recent data on this, since KDE
compiles without exceptions nowadays. I don't know if all the overhead is
caused by the relocations or whether exception handling does some
initialisation during runtime as well which allocates/touches memory. In the
paper that I wrote, exception handling wasn't taken into account.
2) vtables
As far as I know, does every entry in a vtable require a R_386_32 relocation.
That is slow because of the symbol lookup that is associated with it. They
also cause memory usage to increase since the page gets touched.
In Table 5 of my paper I showed as an example that every class derived from
QWidget introduces 109 relocations. 105 of them are R_382_32 relocations and
4 of them R_386_GLOB_DAT.
Looking at the relocation entries in libqt, libkdecore and libkdeui, then it
_seems_ that all relocations of R_386_32 are due to vtables. Glancing over it
shows nothing but virtual functions. (How do I check that they are indeed
part of vtables? Does it matter?)
3) Other data structures
The index part of lookup tables ends up in the .data section. These are all
R_386_REL relocations, so at issue is mostly the memory that they need.
4) The .got section.
This should mostly (only?) contain relocations of type R_386_JUMP_SLOT which
can be done lazy, so only the memory aspect is of importance.
Ok if I will now look at "kedit" as a whole and at libqt, libkdecore and
kdeui seperately, I have also added libXft because it has a surprisingly
large .bss section. "kedit" links to a total of 28 libraries:
/ext/kde-head/lib/kde2/kedit.so
/ext/kde-head/lib/libkspell.so.3
/ext/kde-head/lib/libkfile.so.3
/ext/kde-head/lib/libksycoca.so.3
/ext/kde-head/lib/libkio.so.3
/ext/kde-head/lib/libkdeui.so.3
/ext/kde-head/lib/libkdesu.so.1
/ext/kde-head/lib/libkdecore.so.3
/ext/kde-head/lib/libkdefakes.so.3
/lib/libdl.so.2
/ext/kde-head/lib/libDCOP.so.1
/ext/cvs/qt-copy/lib/libqt.so.2
/usr/lib/libpng.so.2
/usr/lib/libjpeg.so.62
/usr/X11R6/lib/libXext.so.6
/usr/X11R6/lib/libX11.so.6
/usr/X11R6/lib/libSM.so.6
/usr/X11R6/lib/libICE.so.6
/lib/libutil.so.1
/lib/libz.so.1
/usr/local/lib/libfam.so.0
/usr/lib/libstdc++-libc6.2-2.so.3
/lib/libm.so.6
/lib/libc.so.6
/usr/lib/libstdc++-libc6.1-2.so.3
/usr/X11R6/lib/libXft.so.1
/lib/ld-linux.so.2
/usr/X11R6/lib/libXrender.so.1
I will now use the sum over all these libs and refer to that as "kedit".
.got .data .bss R_386_REL R_386_32 R_386_JUMP_SLOT
kedit 129Kb 308Kb 160Kb 21021 43311 25866
libqt 44Kb 131Kb 19Kb 5124 17090 8515
libkdecore 15Kb 17Kb 7Kb 813 1719 3331
libkdeui 24Kb 76Kb 4Kb 684 15547 4687
libXft 1Kb 5Kb 72Kb 2074 46 189
Assuming that each R_386_REL, R_386_32 and R_386_JUMP_SLOT affects 4 bytes of
data that translates into:
.got .data .bss R_386_REL R_386_32 R_386_JUMP_SLOT
kedit 129Kb 308Kb 160Kb 82Kb 169Kb 101Kb
libqt 44Kb 131Kb 19Kb 20Kb 67Kb 33Kb
libkdecore 15Kb 17Kb 7Kb 3KB 7KB 13Kb
libkdeui 24Kb 76Kb 4Kb 3KB 61KB 18Kb
libXft 1Kb 5Kb 72Kb 8KB 0KB 1Kb
So an application like kedit, has a total .data section of 308Kb , 82Kb of
that is touched by R_386_REL relocations and 169Kb of that is touched by
R_386_32 relocations. (which leaves 67Kb unaccounted for) (Assuming that all
R_386_REL and R_386_32 are bound to .data, is that so? How can I check?)
The total number of .got sections amounts to 129Kb, of which 101Kb is touched
by R_386_JUMP_SLOT relocations, leaving 28Kb unaccounted for. (Does .got
contain something else besides jump slots?)
All of the above is without exception handling which appearantly would add an
extra bunch of R_386_32 relocations to this all in eh-sections.
I must say that I still miss some memory because starting kedit leaves me
with 220 dirty pages, or 880Kb, but the .got, .data and .bss together only
account for 597Kb. VmData reports 176Kb, is .bss included in that?
Something else that I noted: R_386_REL+R_386_32 amounts to 64332 relocations,
but according to LD_DEBUG=statistics I only get 50346 relocations when I
start kedit (with lazy binding). Any idea?
> > Besides I doubt whether you have enough control over the layout of
> > the .data section to pull that off.
>
> Not sure what "that" is here. If significant speed improvements can be
> achieved by systematically re-arranging the elements of the .data
> section, I think gcc and/or ld could be taught to execute such
> control.
Yes, I thought that a vtable might contain both R_386_REL and R_386_32
relocations, and you can't of course split the vtable, but it seems that it
consists of R_386_32 entries almost entirely (+ 4 R_386_GLOB_DAT
relocations, not sure if they are part of the vtable itself or if they are
used to point to the vtable/ type info structs)
But rearranging wouldn't do much for the speed, it would mostly improve the
page sharing. (But then again, having less dirty pages improves speed as well
of course)
Cheers,
Waldo
--
bastian@kde.org | SuSE Labs KDE Developer | bastian@suse.com