This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: IFUNC resolver scheduling (bugs 21041, 20019)

From: Carlos O'Donell <carlos at redhat dot com>
To: Florian Weimer <fweimer at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Fri, 27 Jan 2017 20:48:20 -0500
Subject: Re: IFUNC resolver scheduling (bugs 21041, 20019)
Authentication-results: sourceware.org; auth=none
References: <c83c4b2c-f1bc-f022-ec48-4eb5752aa904@redhat.com> <9255e1d2-6d1e-4999-a295-e02871b69eea@redhat.com> <03904b78-c708-5397-fc41-7ad47b866c5f@redhat.com>

On 01/26/2017 09:22 AM, Florian Weimer wrote:
> On 01/26/2017 02:37 AM, Carlos O'Donell wrote:
>> On 01/25/2017 08:57 AM, Florian Weimer wrote:
>>> I tried to fix most of the issues involving IFUNC resolvers and their
>>> access to unrelocated symbols.
>>
>> Thank you very much for tackling this somewhat complicated topic.
>>
>> Like the issue of constructors and destructions, IFUNC is not a trivially
>> black and white issue, and some trade-offs need to be made.
>>
>>> My first attempt (on branch fw/bug21041) records all relocations
>>> against IFUNC resolvers during elf_machine_rela processing, and adds
>>> another pass to apply them.
>>
>> I like this design. It's relatively straight forward to understand,
>> audit and maintain.
>>
>>> The general approach is to execute IFUNC resolvers
>>> deepest-dependency-first (based on the object in which the IFUNC
>>> resolver resides, not the relocation which references it).  Most of
>>> the time, this ensures that IFUNC resolvers do not encounter
>>> unrelocated symbols because IFUNC resolvers from their dependencies
>>> have already executed, patching in the missing relocation.  Symbol
>>> interposition can still bypass that, though.
>>
>> Is there a problem with symbol interposition?
> 
> Symbol interposition can make available to IFUNC resolver function
> definitions which reside in objects which have not been fully
> relocated yet. This is what happens in the initial test case for bug
> 21041 (libpthread interposes longjmp). Delaying IFUNC relocations
> should fix this particular test case because the non-IFUNC
> relocations for libpthread are processed before the IFUNC resolver in
> libpthread runs. But is not a general solution to this kind of
> problem.

This is not specifically a problem of symbol interposition.

The libpthread.so IFUNC resolver violates the requirement that the resolver
be in the same translation unit as the implementation. It returns an
implementation in libc.so. This is not allowed. Ignoring that...

In bug 21041 the test case DT_NEEDED sequence is:

main 1> libfoo.so	1> libbar.so
			1> libc.so		1> ld.so
						1> linux-vdso.so
     2> libpthread.so 	2> linux-vdso.so
			2> libc.so		2> ld.so
						2> linux-vdso.so
			2> ld.so
     3> libc.so		3> ld.so
			3> linux-vdso.so
     4> ld.so
     5> linux-vdso.so

The naive DT_NEEDED ordering is incorrect because there are relocation-based
dependencies which require the ordering to be adjusted.

Specifically libbar.so has a R_X86_64_JUMP_SLOT reloc against the longjmp
symbol, and the definition of that symbol, even interposed, creates a dependency
edge here at 'X':

main 1> libfoo.so	1> libbar.so		X> libpthread.so	X> linux-vdso.so
									X> libc.so		X> ld.so
									X> linux-vdso.so		
			1> libc.so		1> ld.so
						1> linux-vdso.so
     #> libpthread.so 	#> linux-vdso.so
			#> libc.so		#> ld.so
						#> linux-vdso.so
			2> ld.so
     3> libc.so		3> ld.so
			3> linux-vdso.so
     4> ld.so
     5> linux-vdso.so

Unfortunately glibc has no machinery to do correct topological sorts based
on DT_NEEDED _and_ symbol/relocation dependencies.

We have only l_reldeps in the struct link_map which is used post-priori by
_dl_lookup_symbol_x to record the dependency such that dlclose() cannot
remove objects which have such dependencies created by relocations.

Recursing is dangerous. We really need a global view of the topological
sort, and then act on the results (perhaps after optimizations).

>>> The branch is currently slightly buggy because it reverses the order
>>> of relocation processing within the same object, and it does not
>>> delay copy relocations, so those could copy unrelocated data.
>>> (Support for copy relocations whose source address is determined by
>>> an IFUNC resolver is missing as well, but this is a separate matter
>>> and quite easy to fix.)
>>
>> I don't think there is any real bug in reversing the order of the
>> IRELATIVE relocation handling.  What matters more is that other
>> earlier listed relocations are completely handled in order that the
>> resolver function may run correctly (where possible, and if not possible
>> then at least deterministic).
> 
> Right.  It might still be beneficial to process these relocations in increasing order.

OK.

>> Isn't delaying copy relocations and support for copy relocations whose
>> source address is determined by an IFUNC resolver the same thing?
> 
> No, I don't think so.
> 
> The latter is simply a missing switch case in the handler for delayed
> IFUNC processing. The former is a relocation ordering issue.

OK, I think I see what you mean.

>>> Support for DT_TEXTREL is currently missing, too.  This needs another
>>> preparatory patch like the one to delayed RELRO activation.  (For
>>> dlopen, the RELRO activation delay does a bit too much work because
>>> it mprotects everything again, not just the newly allocated
>>> objects.)
>>
>> DT_TEXTREL shouldn't be too different, just filling the right pieces
>> given the framework you already have in place?
> 
> Yes, it's similar to RELRO processing.

OK.

>> I think Szabolcs makes a good point here, in that we should consider:
>>
>> * What is the cost of a second walk?
> 
> I fear it is substantial. I'm attaching some crude scripts which I
> used to gather statistics.
> 
> For example, I get this for /bin/bash:
> 
> Section '.rela.dyn' contains 2512 relocations:
>   19 IFUNC relocations from index 2294 to 2496, interval size 203 (92.3%)
>   7 COPY relocations from index 2505 to 2511, interval size 7 (0.3%)
> 
> In general, we typically have 10% of relocations which we have redo
> per object. But if we look at only those relocations which require
> symbol lookup, the percentage is much higher (92.3% in the bash
> example).
> 
> The main problem with the two-pass approach is that there is no place
> where it can cache symbol lookup results. If we assume that
> relocation performance is dominated by symbol lookups (which is
> reasonable IMHO), we are looking at a very substantial performance
> hit for the two-pass approach. Some of the performance hit could be
> avoided if we installed a magic function pointer as a temporary
> relocation result, and skip relocations which are not equal to that
> magic value on subsequent passes.

OK, this looks like it's an issue because the IFUNCs are not localized
in an way (as you suggest via some combreloc).

> The next problem is that it's not actual two-pass. We'd need a
> separate pass for each IFUNC provider/consumer pair, I think.
> Otherwise, we would not perform IFUNC relocations in depth-first
> order anymore.

I think you are right. At that point I think it would be possible to
stop doing the hackish-two-pass and add reocation dependency information
as a first class citizen to the dynamic loader and use it in conjunction
with DT_NEEDED to make the DAG of relocation processing ordering.
 
>> * What is the cost of only recording only that IRELATIVE relocs exist
>>   (recording the first one) and then starting the walk from there?
> 
> IRELATIVE relocations are very rare because you need symbol
> visibility management and IFUNCs in the same object. libc.so.6 has 8
> of them. Some of them are likely accidents and unintentional. I'm not
> sure if they are a good target for optimization.

OK.
 
>>> The combreloc feature sorts relocations of the same type together.
>>> The static linker could sort relocations involving IFUNCs together,
>>> and we could recognize this in the dynamic linker and record spans of
>>> relocations instead individual relocations.  This would reduce the
>>> number of delayed relocation records we'd have to allocate.
>>
>> This is exactly what POWER does for OPD-related relocs (Alan Modra
>> explained this to me once).
>>
>> I think this would be a reasonable optimization, but it's just that,
>> an optimization. Since you can always record the first record, and
>> start the walk there for the given object?
> 
> It's a very important optimization because it ensures we only re-do
> the symbol lookups we absolutely have to do. Even if we have the
> optimized symbol ordering, a newly added IFUNC could easily cause a
> 25% performance hit for relocation processing because the symbol
> range is much larger again.

Yes, too true.

-- 
Cheers,
Carlos.

References:
- IFUNC resolver scheduling (bugs 21041, 20019)
  - From: Florian Weimer
- Re: IFUNC resolver scheduling (bugs 21041, 20019)
  - From: Carlos O'Donell
- Re: IFUNC resolver scheduling (bugs 21041, 20019)
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]