This is the mail archive of the elfutils-devel@sourceware.org mailing list for the elfutils project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Performance of libdw


Hi,

I use readelf from elfutils to extract ABI information from the debug 
info of an ELF object in the ABI Dumper tool [1]. The -N option of 
readelf has fixed a problem with performance. But most of the symbol 
names have missed from the debug dump. So I got symbol names by their 
addresses effectively from the cached .dynsym and .symtab tables of an 
object.

Thank you.

[1] https://github.com/lvc/abi-dumper

Mark Wielaard wrote:
> Hi Andrey,
>
> (Thanks for upstream-tracker BTW. It has been really useful.)
>
> On Fri, Jul 12, 2013 at 06:49:59PM +0400, Andrey Ponomarenko wrote:
>> I've found an O(N^2) loop in the libdw library (N - number of
>> symbols in the library). The dwfl_module_getsym function is called N
>> times, but it calls resolve_symbol function each time and it loops
>> through the number of symbols again.
>>
>> So `readelf --debug-dump=info shared_object.so` command is very slow
>> on large input objects. As an example, it takes several minutes to
>> analyse libOpenImageIO.so.1.1.11 (27mb), but readelf from binutils
>> takes just several seconds to do it.
> If the goal is just to have a faster eu-readelf -w than binutils
> readelf -w then just run it with -N and it will not do any address
> to symbol lookups and it should be faster.
>
> But you are right that -N/--numeric-addresses isn't the default.
> And by default eu-readelf will do address symbol lookups (because
> that makes the output much richer than plain binutils readelf gives)
> and those are indeed pretty slow atm.
>
>> The gelf_getsymshndx, gelf_getshdr and elf_getscn functions from
>> libelf are called 2 billion times and it takes about 60% of total
>> execution time. Another 26% of execution time are taken by the
>> search_table function that is called by the dwfl_module_addrsym and
>> loops through the number of symbols too. The dwfl_module_addrsym is
>> called for each symbol. So it also takes O(N^2).
>>
>> How can dwfl_module_getsym and dwfl_module_addrsym functions may be
>> optimised to not to loop through all symbols in the library in order
>> to find information for just one of them? This can make the library
>> to be 10 times faster.
> Those functions work with the "raw" .dynsyn or .symtab (in the ELF,
> separate .debug file or in the .gnu_debugdata). Those tables aren't
> sorted by address (just by whether they are local or global, we
> prefer globals so we search those first in the hope not to have to
> do a full N search) so we have to search them linearly.
>
> They are really designed for doing a quick lookup for a single address.
> If you really need all the addresses/symbols then iterating through
> them all with dwfl_module_getsymtab() and dwfl_module_getsym (), then
> you can create a search table for them all.
>
> So readelf.c could be considered to use dwfl_module_addrsym "wrongly"
> since in general it will call it for lots of different addresses. We
> should probably build a search table. And maybe even do that not just
> in readelf.c but add it to libdwfl for use with dwfl_module_addrsym
> directly?
>
> Cheers,
>
> Mark

-- 
Andrey Ponomarenko, ROSA Lab.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]