This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][v3] Add dynamic linker support for $EXEC_ORIGIN.


On Sun, Mar 30, 2014 at 5:54 PM, Carlos O'Donell <carlos@redhat.com> wrote:

>> In some cases, such as symlink farms backed by cloud storage filesystems
>> where the resolved path encodes effectively-arbitrary storage data, it
>> is much more useful to have a rpath token that points to the
>> non-expanded, as-called version of the executable path.  This patch adds
>> the $EXEC_ORIGIN token for this purpose, resolving to the executable
>> path as passed to execve().
>
> Why is this what you want? What is wrong with the path encoding effectively
> arbitrary storage data? I still don't see what problem you're trying to
> solve by using $EXEC_ORIGIN. Can you walk me through an example execution
> sequence that shows how you would use this?

As described here:
http://google-engtools.blogspot.com/2011/09/build-in-cloud-distributing-build-steps.html,
we use content-addressable storage, so if you build foo.so, and I
build a bit-identical bar.so, then your build tree contains a "foo.so"
symlink to "cloud container", and my build tree contains "bar.so"
symlink to that exact same container.

The actual container is named after the contents of the file, and we
use md5sum. So the actual container will be called something like

/cas/d41d/d41d8cd98f00b204e9800998ecf8427e

Now let's say that both you and I build "a.out", which gets us both
symlinks to /cas/adf6/adf6b91b87913b11001e764d69d23203.

I hope it's clear that this tremendously helps with storage
requirements when multiple (1000s of) people are likely to build many
instances of the exact same binary.

As described in the blog post, this also tremendously helps with build
speed -- if you are about to link a large executable using the exact
same command line and input object files that I just used to build the
same executable, then your link does not need to actually be performed
-- you benefit from the work build system has done on my behalf, and
you get a cached copy of the output (we get better than 90% cache hit
rate). Also note, that the same caching applies to compile steps as
well -- if you are about to compile foo.o from the same set of sources
as I just did, using the same compile command line, then the build
system just gives you a symlink to existing /cas/... that would have
resulted IF it actually performed the compile (we force GCC to produce
bit-identical object files given the same sources and the same command
line).

I hope above description is clear and non-controversial :-)

Now consider what happens when your a.out needs to find a shared
library (that you also just built) foo.so at runtime.

Assume that the layout of the build tree is:

  ./bin/a.out        (symlink to) /cas/d41d/d41d8cd98f00b204e9800998ecf8427e
  ./solib/foo.so     (symlink to) /cas/adf6/adf6b91b87913b11001e764d69d23203

How can the binary find foo.so?

A couple of ways:

1. It could hard-code full
"/cas/adf6/adf6b91b87913b11001e764d69d23203" path as DT_NEEDED
2. It could use relative "./solib" DT_RPATH and DT_NEEDED of "foo.so"
3. It could use "$ORIGIN/../solib" DT_RPATH and DT_NEEDED of "foo.so"
4. It could use "$EXEC_ORIGIN/../solib" DT_RPATH and DT_NEEDED of "foo.so"
5. Some other solution we didn't think of.

The (4) is the new mechanism being propsed here.

So what's wrong with (1) through (3)?

(1) Doesn't work when you copy the build tree out of the cloud and
onto your local machine. It is desirable to be able to do so when e.g.
you want to debug locally. It is also undesirable because when you
rebuild "foo.so" without changing any of its symbols (i.e. without
changing its interface), the "a.out" does not actually need to be
relinked, increasing cache hit rate even further.

(2) Doesn't work when you invoke the binary in any directory other
than the top of the build tree, which is again quite undesirable.
Having a relative RPATH is obviously a bad idea.

(3) Doesn't work because ld.so performs a readlink() on the
"./bin/a.out", so we end up searching "/cas/../solib", which of course
doesn't exist.

(4) Works!

(5) Not invented yet, suggestions welcome.



-- 
Paul Pluzhnikov


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]