This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Interface to resolve SONAMES, ld.so.cache format


On Sun, Apr 27, 2014 at 5:12 AM, Álvaro Acción Montes
<alvaroaccionmontes@gmail.com> wrote:
> Hello,

Hello Alvaro!

> This is a question I asked in the #glibc channel, but someone kindly pointed
> me to ask it here. I'll try to be as clear as I can with my explanation,
> provided I'm not really that fluent in either English, or explaining myself.

This is the right place to ask questions.

> I'm working in a academic project, that has to do with monitoring
> executables in terms of address space usage, etc. Similar to what Valgrind
> does, but I was not the one who proposed it :)
>
> For a series of reasons, I need to know what objects are going to be mapped
> into the process adress space, similar to what ldd does, but it needs to be
> programmatically determined (not a requirement, but it would cost me
> points). I've of course had a look to ldd's source, but it uses enviroment
> variables to tell the loader to print "debug" information. I was wondering
> if there was a well defined interface to obtain that information without
> having to rely in the method ldd uses, but it seems the simplest answer is
> the correct one, and there is no such method.

The answer depends on what you want to know.

(1) Know all mappings, but have no semantic information about them.

If you want to know account for all mappings you need to provide your
own implementation of mmap, log those calls, and call the real mmap
under the hood. You would know all of the mappings but have no idea
what they were for.

(2) Know all shared mappings with semantic information about them.

There is a probe-based debugger interface to the runtime dynamic
loader described in elf/rtld-debugger-interface.txt. That interface
provides a debug agent with all of the information about the modules
as they are being loaded. However you have to be "watching" from the
very start of the application to catch all of the events for all
loaded objects.

(3) Know some shared mappings (missing alternate namespaces) using _r_debug.

You can know some of the mappings by using the classic _r_debug
rendevous structure in the dynamic loader to walk the list of loaded
shared objects. You can cast _r_debug to the full structure used by
the loader if you don't care about being portable to any other version
of glibc or support different structures as glibc changes version
(using the right one for the right version).

> So my current approach is as follows. I get a list of required dependencies
> by iterating the .dynamic section of the executable and getting all the
> entries that are tagged as DT_NEEDED. The next step would be to find the
> shared objects with SONAME's matching the previous list. At this point, I
> was left wondering if there was a way to obtain them other than parsing
> ld.so.cache, but I had not luck. Correct me if I'm wrong, but it seems that
> the linker is a completely isolated entity, and the only functionality
> exported is via ldopen() function family.

You do not need to parse ld.so.cache. You *do* need to parse
ld.so.conf in order to determine where the dynamic linker will search
for shared libraries. You also need to read each ELF file and look for
DT_SONAME to determine the soname of that shared library. You then
need to follow the normal ELF rules and keep recursively finding all
the DSOs that would be needed to form the final application image.

> Now, I need to parse ld.so.cache, but it seems I'm not able to figure it's
> format. There are 2 versions, and there is a note that says that for
> Glibc2.2 there is a new format added in a compatible way, but I'm inclined
> to think that a normal file would rely only in one of those, even if the
> second is kept for compatibility's sake.

The format of the cache on a given machine is going to be constant. We
haven't changed the format in years. You should not have to worry
about the old pre-glibc-2.2 format.

To understand the format you must read and understand all of
elf/dl-cache.c. There is no public documentation for this, but I would
be very grateful if you want to document your findings on the
community wiki (https://sourceware.org/glibc/wiki/).

To edit the wiki you have to (a) register and then (b) get someone to
vouch for you and add you to EditorGroup, just ask on #glibc and
someone should help you. This process prevents 100% of spam because
you have to talk to a real human.

> So now my questions are (answering the outermost would render the remaining
> ones irrelevant):
>
> - Is there a clean way to get the functionality of ldd programmatically.
> I.E. without having to call ldd with pipes?

It depends on what you mean by clean. I described 3 possible ways. The
easiest solution is using the dynamic loader in trace mode since it's
the most accurate reflection of reality and doesn't duplicate any
code.

We have been trying to collect use cases for a new tooling interface
library that would allow introspection into the running
process/threads etc.
https://sourceware.org/glibc/wiki/Tools%20Interface%20NG

Your use case might be useful, please add it :-)

> - Is there a clean way to resolve SONAMES with the corresponding shared
> object in the system

That depends on what you mean by clean. You have to (a) follow the
standard lookup rules (b) follow the lookup paths in ld.so.conf and
(c) follow the runtime lookup rules if the application happens to
preload a library via LD_PRELOAD or (e) open a library via dlopen.
Otherwise a static analysis of DT_SONAME lookups yields only the
static results you would expect. You need to do this lookup yourself
by parsing ELF files.

> - How can I parse ld.so.cache? More precissely, is it safe? Is it worth
> doing? What's the format it uses in glic-2.19 (that's not a real issue, it
> can be 2.XX)? There's a macro in glibc's source code that ues a binary
> search IIRC, but I'm not limited to that (in the sense I can use some less
> efficient alternatives if I'm able to figure the format in order to progress
> faster*)

You have to write your own code to parse ld.so.cache or copy code from
elf/* with the license and copyright being applied to your project now
for including that code. The code for parsing ld.so.cache is mostly in
elf/dl-cache.c. I would just parse the ld.so.conf file and then parse
the ELF files without relying on the cache, but that's your design
choice. Simplest of all is to process the output of ld.so in trace
mode.

It is safe to parse ld.so.cache as long as you follow the rules for
locking the configuration and cache files so you don't see partial
updates if another process or upgrade is running that modifies the
conf or cache files.

> Thank you very much for your time, and I'm really sorry for the lengthy
> email.

No worries.

> * For an undergraduate assignment I think this is more than enough,
> considering this mail makes for a tenth of the whole thing.

Given that this is an undergraduate assignment I'd just parse the
output of ld.so in trace mode e.g. ldd.

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]