This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS


On Jun  3, 2015, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:

> On Wed, Jun 03, 2015 at 03:44:58AM -0300, Alexandre Oliva wrote:
>> We used to store static TLS addrs in the DTV at module load time, but
>> this required one thread to modify another thread's DTV.  Now that we
>> defer the DTV update to the first use in the thread, we should do so
>> without taking the rtld lock if the module is already assigned to static
>> TLS.  Taking the lock introduces deadlocks where there weren't any
>> before.
>> 
>> This patch fixes the deadlock caused by tls_get_addr's unnecessarily
>> taking the rtld lock to initialize the DTV entry for tls_dtor_list
>> within __call_dtors_list, which deadlocks with a dlclose of a module
>> whose finalizer joins with that thread.  The patch does not, however,
>> attempt to fix other potential sources of similar deadlocks, such as
>> the explicit rtld locks taken by call_dtors_list, when the dtor list
>> is not empty; lazy relocation of the reference to tls_dtor_list, when
>> TLS Descriptors are in use; when tls dtors call functions through the
>> PLT and lazy relocation needs to be performed, or any such called
>> functions interact with the dynamic loader in ways that require its
>> lock to be taken.
>> 
>> Ok to install?

> It's not good enough

... for what?

It sure isn't good enough to enable anyone to do whatever they like in a
module finalizer, including blocking waiting for other threads that
interact, explicitly or implicitly, with the dynamic loader before the
finalizer is unblocked.  Most of these issues are preexisting, and
explicitly not addressed by the patch.

It is good enough, however, to fix the regression, namely, that
___tls_get_addr for an IE variable used to be able to complete even if
the rtld lock was taken by another thread, while now it's slower than
needed and it may deadlock.  The patch is intended to fix this
performance and deadlock regression, not any of the preexisting
problems.

> The simple patch to the test case below will cause the test
> case to deadlock.

I very much doubt it would have completed before the DTV race fix.
___tls_get_addr would find the l_tls_offset for tst-join7mod.so to be
undecided, and would attempt to take the lock to make it forced dynamic.
Deadlock.

It's not like accessing TLS variables is special WRT interacting
implicitly with the dynamic loader.  It's just the tip of the iceberg.
Any lazily-bound function called for the first time will do so as well.
libdl functions will also interact with the dynamic loader, obviously,
and there are various cases in which localization libraries deeply
hidden in library internals will dlopen modules to translate an error
message or somesuch.  There are so many things that can go wrong if a
module initializer or finalizer is called with the rtld lock held that
I'm convinced the solution is not to go case after case after case
trying to somehow make it lock-free, but rather to release the rtld lock
before handing control over to user-supplied code (init and fini).  This
would require, before releasing the lock, placing the modules in
intermediate states and making a list of things without the lock and
after taking the lock again, and then, after we're done with the
lockless list and take the lock back, checking that the things to do are
still appropriate.  We might even require additional corner-case states,
such as a module that was to be finalized and released, but that got
dlopened again by some other concurrent finalizer.  Tricky, but doable.

The alternative, IMHO, is not to go after each case that takes the rtld
lock and somehow avoid that, but rather to document what can and cannot
be done in module finalizers.  We might want to rule out synchronization
with other threads, or just document that they are called while holding
a lock that various infrastructure bits have to take at unanticipated
times, so any such synchronization is prone to deadlocks.

> Andreas' reproducer can be fixed by simply setting
> the TLS variables in cxa_thread_atexit as initial exec

It's not clear to me that we want to make libc.so variables IE.  If we
do, there are a number of others that could benefit from this treatment.

> 1. All of the lock taking and NODELETE flag clearing in
>    cxa_thread_atexit.

As I wrote:

>> The patch does not, however,
>> attempt to fix other potential sources of similar deadlocks, such as
>> the explicit rtld locks taken by call_dtors_list

> 2. The lock taking in tls_get_addr_tail.

Thanks, I failed to list this one as a remaining non-regression problem
we might have to address.

>    That has to go and we need
>    to figure out another way to wait for another dlopen to complete.

I don't think so, but I won't try to stop you ;-)

-- 
Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]