This is the mail archive of the
mailing list for the binutils project.
Re: Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
- From: "Rahul Chaudhry via binutils" <binutils at sourceware dot org>
- To: Cary Coutant <ccoutant at gmail dot com>
- Cc: Roland McGrath <roland at hack dot frob dot com>, Sriraman Tallam <tmsriram at google dot com>, Florian Weimer <fw at deneb dot enyo dot de>, Rahul Chaudhry via gnu-gabi <gnu-gabi at sourceware dot org>, Suprateeka R Hegde <hegdesmailbox at gmail dot com>, Florian Weimer <fweimer at redhat dot com>, David Edelsohn <dje dot gcc at gmail dot com>, Rafael Avila de Espindola <rafael dot espindola at gmail dot com>, Binutils Development <binutils at sourceware dot org>, Alan Modra <amodra at gmail dot com>, Xinliang David Li <davidxl at google dot com>, Sterling Augustine <saugustine at google dot com>, Paul Pluzhnikov <ppluzhnikov at google dot com>, Ian Lance Taylor <iant at google dot com>, "H.J. Lu" <hjl dot tools at gmail dot com>, Luis Lozano <llozano at google dot com>, Peter Collingbourne <pcc at google dot com>, Rui Ueyama <ruiu at google dot com>, llvm-dev at lists dot llvm dot org
- Date: Fri, 15 Dec 2017 12:23:09 -0800
- Subject: Re: Reducing code size of Position Independent Executables (PIE) by shrinking the size of dynamic relocations section
- Authentication-results: sourceware.org; auth=none
- References: <CAGWvnynFwXFGLj3tAVgDatn0zmuHcWHyRNuDvR+wRZCXLnar_A@mail.gmail.com> <firstname.lastname@example.org> <CAGWvnynEe3QkhDMGc=Tx8Vr44egtv3xLuh1yiVcAhv+e3GLtZg@mail.gmail.com> <email@example.com> <firstname.lastname@example.org> <email@example.com> <CAAs8Hmziqc0hebPndiGuZN=buFm=M+O+2fGCfsv_rvDro9zJZA@mail.gmail.com> <CAJRD=ooGubyUOLE6W7LHdeU2ZNDEG1A=84+P=1iOvfmD7-7GNg@mail.gmail.com> <firstname.lastname@example.org> <CAAs8HmwMRTjyLjvUAbP9drkagbpedonHOGGRvoFQVr1TE7wyCQ@mail.gmail.com> <CAJRD=opP96vFuSKK-1d1jw3nOKeTDE1T_E5hDwj3Zy-VUeAnRA@mail.gmail.com> <CAORpzuMftCGpXUObOyoFY0=jorMBDWEDbQJ23DifTNW3v-WA6Q@mail.gmail.com> <CAJRD=opERJszwQMFfaKMVdOYF-YAbqqYW0iNWWMqNp3pq2njzw@mail.gmail.com> <CAJimCsHJ9H0uhMbrAZm-BS_VpYggv21ENJm7Q56LTOqC4scYnQ@mail.gmail.com>
- Reply-to: Rahul Chaudhry <rahulchaudhry at google dot com>
On Thu, Dec 14, 2017 at 12:11 AM, Cary Coutant <email@example.com> wrote:
>> While adding a 'stride' field is definitely an improvement over simple
>> delta+count encoding, it doesn't compare well against the bitmap based
>> I took a look inside the encoding for the Vim binary. There are some instances
>> in the bitmap based encoding like
>> [0x3855555555555555 0x3855555555555555 0x3855555555555555 ...]
>> that encode sequences of relocations applying to alternate words. The stride
>> based encoding works very well on these and turns it into much more compact
>> [0x0ff010ff 0x0ff010ff 0x0ff010ff ...]
>> using stride==0x10 and count==0xff.
> Have you looked much at where the RELATIVE relocations are coming from?
> I've looked at a PIE build of gold, and they're almost all for
> vtables, which mostly have consecutive entries with 8-byte strides.
> There are a few for the GOT, a few for static constructors (in
> .init_array), and a few for other initialized data, but vtables seem
> to account for the vast majority. (Gold has almost 19,000 RELATIVE
> dynamic relocs, and only about 500 non-RELATIVE dynamic relocs.)
> Where do the 16-byte strides come from? Vim is plain C, right? I'm
> guessing its RELATIVE relocation count is fairly low compared to big
> C++ apps. I'm also guessing that the pattern comes from some large
> structure or structures in the source code where initialized pointers
> alternate with non-pointer values. I'm also curious about Roland's
I took a look inside vim for the source of the ..5555.. pattern (relative
relocations applying to alternate words). One of the sources is the
"builtin_termcaps" symbol, which is an array of "struct builtin_term":
So the pattern makes sense. An encoding using strides will work really well
here with stride == 0x10.
There is another repeating pattern I noticed in vim ..9999... One of the
sources behind this pattern is the "cmdnames" symbol, which is an array of
char_u *cmd_name; /* name of the command */
ex_func_T cmd_func; /* function for this command */
long_u cmd_argt; /* flags declared above */
int cmd_addr_type; /* flag for address type */
In this struct, the first two fields are pointers, and the next two are
scalars. This explains the ..9999.. pattern for relative relocations. This is
an example where a stride based encoding does not work well, simply because
there is no single stride. The deltas are 8,24,8,24,8,24,...
I think these two examples demonstrate the main weakness of using a simple
stride based encoding: it is too sensitive to how the data structures are laid
out in the program source.