| Summary: | pthread_join deadlock in library destructor | ||
|---|---|---|---|
| Product: | glibc | Reporter: | Andreas Schwab <schwab> |
| Component: | dynamic-link | Assignee: | Not yet assigned to anyone <unassigned> |
| Status: | RESOLVED FIXED | ||
| Severity: | normal | CC: | aoliva, aoliva, carlos, david.abdurachmanov, fweimer, meave390, siddhesh |
| Priority: | P2 | Flags: | fweimer:
security-
|
| Version: | 2.21 | ||
| Target Milestone: | 2.22 | ||
| Host: | Target: | ||
| Build: | Last reconfirmed: | ||
| Project(s) to access: | ssh public key: | ||
| Bug Depends on: | |||
| Bug Blocks: | 17090, 17620, 17621, 17628 | ||
| Attachments: | Patch that fixes the regression | ||
Created attachment 8340 [details]
Patch that fixes the regression
Thanks for the report . Here's the patch I'm testing to fix the regression.
(In reply to Alexandre Oliva from comment #1) > Created attachment 8340 [details] > Patch that fixes the regression > > Thanks for the report . Here's the patch I'm testing to fix the regression. We are still looking to fix this in 2.22, but Alex's patch may not be the solution that is decided upon. Initially as a conservative fix we may just not uload the DSO that have destructors used by thread_local. Then in 2.23 we will put in a broader fix that resolves and documents the concurrency requirements for the internal glibc interfaces. Don't even consider the patch, it's wrong. We can't assume that, just because l_tls_offset is set, the TLS block has been set up for all threads; a concurrent dlopen may have already set l_tls_offset for a previously-loaded but TLS-undecided module, but not yet completed initializing the TLS blocks for all threads. Taking the lock guarantees we wait for that initialization to complete. In order to make this work, we'd have to have dlopen to set l_tls_offset to some value indicating initialization is underway and must be waited for, overwriting it with the final value that obviates the lock only after the initialization is done. I think we're better off leaving this as is, and focusing on ways to run module init and fini without holding the rtld lock. This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, master has been updated
via e400f3ccd36fe91d432cc7d45b4ccc799dece763 (commit)
from 48f5f7a63c5971a698ac2b97e4c383e04cbbe110 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e400f3ccd36fe91d432cc7d45b4ccc799dece763
commit e400f3ccd36fe91d432cc7d45b4ccc799dece763
Author: Siddhesh Poyarekar <siddhesh@redhat.com>
Date: Fri Jul 24 19:13:38 2015 +0530
Use IE model for static variables in libc.so, libpthread.so and rtld
The recently introduced TLS variables in the thread-local destructor
implementation (__cxa_thread_atexit_impl) used the default GD access
model, resulting in a call to __tls_get_addr. This causes a deadlock
with recent changes to the way TLS is initialized because DTV
allocations are delayed and hence despite knowing the offset to the
variable inside its TLS block, the thread has to take the global rtld
lock to safely update the TLS offset.
This causes deadlocks when a thread is instantiated and joined inside
a destructor of a dlopen'd DSO. The correct long term fix is to
somehow not take the lock, but that will need a lot deeper change set
to alter the way in which the big rtld lock is used.
Instead, this patch just eliminates the call to __tls_get_addr for the
thread-local variables inside libc.so, libpthread.so and rtld by
building all of their units with -mtls-model=initial-exec.
There were concerns that the static storage for TLS is limited and
hence we should not be using it. Additionally, dynamically loaded
modules may result in libc.so looking for this static storage pretty
late in static binaries. Both concerns are valid when using TLSDESC
since that is where one may attempt to allocate a TLS block from
static storage for even those variables that are not IE. They're not
very strong arguments for the traditional TLS model though, since it
assumes that the static storage would be used sparingly and definitely
not by default. Hence, for now this would only theoretically affect
ARM architectures.
The impact is hence limited to statically linked binaries that dlopen
modules that in turn load libc.so, all that on arm hardware. It seems
like a small enough impact to justify fixing the larger problem that
currently affects everything everywhere.
This still does not solve the original problem completely. That is,
it is still possible to deadlock on the big rtld lock with a small
tweak to the test case attached to this patch. That problem is
however not a regression in 2.22 and hence could be tackled as a
separate project. The test case is picked up as is from Alex's patch.
This change has been tested to verify that it does not cause any
issues on x86_64.
ChangeLog:
[BZ #18457]
* nptl/Makefile (tests): New test case tst-join7.
(modules-names): New test case module tst-join7mod.
* nptl/tst-join7.c: New file.
* nptl/tst-join7mod.c: New file.
* Makeconfig (tls-model): Pass -ftls-model=initial-exec for
all translation units in libc.so, libpthread.so and rtld.
-----------------------------------------------------------------------
Summary of changes:
ChangeLog | 10 +++++++
Makeconfig | 6 +++-
NEWS | 12 ++++----
nptl/Makefile | 10 +++++-
nptl/tst-join7.c | 46 ++++++++++++++++++++++++++++++++
nptl/{tst-detach1.c => tst-join7mod.c} | 46 ++++++++++++++++++--------------
6 files changed, 101 insertions(+), 29 deletions(-)
create mode 100644 nptl/tst-join7.c
copy nptl/{tst-detach1.c => tst-join7mod.c} (52%)
Fixed in master. Have to need you just follow the online this link here https://fundecade.com/sliding-block-puzzle and looking the online here update of latest Sliding Block Puzzle free online game most of the players have exited for this. |
A dlopen'd library that is unloaded and calls pthread_join in the destructor causes a deadlock while waiting for the child to finish. $ cat pthread-join.c #include <dlfcn.h> int main (void) { void *f = dlopen ("pthread-join-test.so", RTLD_NOW | RTLD_GLOBAL); if (f) dlclose (f); } $ cat pthread-join-test.c #include <stdio.h> #include <pthread.h> static pthread_t th; static int running = 1; static void * test_run (void *p) { while (running) fprintf (stderr, "XXX test_run\n"); fprintf (stderr, "XXX test_run FINISHED\n"); return NULL; } static void __attribute__ ((constructor)) do_init (void) { pthread_create (&th, NULL, test_run, NULL); } static void __attribute__ ((destructor)) do_end (void) { running = 0; fprintf (stderr, "thread_join...\n"); pthread_join (th, NULL); fprintf (stderr, "thread_join DONE\n"); } $ gcc -fPIC -shared pthread-join-test.c -lpthread -o pthread-join-test.so $ gcc pthread-join.c -lpthread -ldl -Wl,-path=. -o pthread-join $ ./pthread-join thread_join... XXX test_run FINISHED This was broken by f8aeae3 ("Fix DTV race, assert, DTV_SURPLUS Static TLS limit, and nptl_db garbage").