This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS

From: Torvald Riegel <triegel at redhat dot com>
To: Alexandre Oliva <aoliva at redhat dot com>
Cc: libc-alpha at sourceware dot org
Date: Wed, 03 Jun 2015 12:19:48 +0200
Subject: Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
Authentication-results: sourceware.org; auth=none
References: <orvbf5ffyt dot fsf at livre dot home>

On Wed, 2015-06-03 at 03:44 -0300, Alexandre Oliva wrote:
> We used to store static TLS addrs in the DTV at module load time, but
> this required one thread to modify another thread's DTV.  Now that we
> defer the DTV update to the first use in the thread, we should do so
> without taking the rtld lock if the module is already assigned to static
> TLS.  Taking the lock introduces deadlocks where there weren't any
> before.
> 
> This patch fixes the deadlock caused by tls_get_addr's unnecessarily
> taking the rtld lock to initialize the DTV entry for tls_dtor_list
> within __call_dtors_list, which deadlocks with a dlclose of a module
> whose finalizer joins with that thread.  The patch does not, however,
> attempt to fix other potential sources of similar deadlocks, such as
> the explicit rtld locks taken by call_dtors_list, when the dtor list
> is not empty; lazy relocation of the reference to tls_dtor_list, when
> TLS Descriptors are in use; when tls dtors call functions through the
> PLT and lazy relocation needs to be performed, or any such called
> functions interact with the dynamic loader in ways that require its
> lock to be taken.
> 
> Ok to install?

It would be good to have the synchronization for TLS, in particular how
we implement it and which lock protects accesses to which data, be
documented somewhere.  I didn't see a pointer to such documentation,
does it exist?  If not, and given that you seem to be familiar with it,
it would be good if you could add it.

> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index 20c7e33..726f1ea 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -755,30 +755,44 @@ tls_get_addr_tail (GET_ADDR_ARGS, dtv_t *dtv, struct link_map *the_map)
>        the_map = listp->slotinfo[idx].map;
>      }
>  
> -  /* Make sure that, if a dlopen running in parallel forces the
> -     variable into static storage, we'll wait until the address in the
> -     static TLS block is set up, and use that.  If we're undecided
> -     yet, make sure we make the decision holding the lock as well.  */
> -  if (__glibc_unlikely (the_map->l_tls_offset
> -			!= FORCED_DYNAMIC_TLS_OFFSET))
> +  /* If the TLS block for the map is already assigned to dynamic or to
> +     static TLS, avoid the lock.  Be careful to use the same value for
> +     both tests; if we reloaded it, the second test might mistake
> +     forced dynamic for an offset.  Now, if the decision hasn't been
> +     made, take the rtld lock, so that an ongoing dlopen gets a chance
> +     to complete, and then retest; if the decision is still pending,
> +     force the module to dynamic TLS.  */
> +  ptrdiff_t offset = atomic_load_relaxed (&the_map->l_tls_offset);

This should document why relaxed MO is sufficient, or briefly refer to
existing documentation that does so.  For example, if we do not rely on
anything that the thread providing the value did before setting the
value.

> +  if (__glibc_unlikely (offset != FORCED_DYNAMIC_TLS_OFFSET))
>      {
> +      if (__glibc_unlikely (offset != NO_TLS_OFFSET))
> +	goto static_tls;
>        __rtld_lock_lock_recursive (GL(dl_load_lock));

Why do we need the lock at all?  Are we protecting just access to
l_tls_offset or to other memory locations as well?  If it's just
l_tls_offset, we could use a CAS and avoid the lock altogether.

> -      if (__glibc_likely (the_map->l_tls_offset == NO_TLS_OFFSET))
> +      offset = the_map->l_tls_offset;

l_tls_offset is effectively concurrently accessed data, so this must use
atomic accesses.  Otherwise, you're telling the compiler that there's no
concurrent access (because you promise to give it a program free of data
races), and allow it to reload or speculatively write a value to this
memory location, for example.  Likewise for the store below.

> +      if (__glibc_likely (offset == NO_TLS_OFFSET))
>  	{
>  	  the_map->l_tls_offset = FORCED_DYNAMIC_TLS_OFFSET;

Must be an atomic access.

>  	  __rtld_lock_unlock_recursive (GL(dl_load_lock));
>  	}
> -      else if (__glibc_likely (the_map->l_tls_offset
> -			       != FORCED_DYNAMIC_TLS_OFFSET))
> +      else if (__glibc_likely (offset != FORCED_DYNAMIC_TLS_OFFSET))
>  	{
> +	  /* The decision is made, and it is final.  We use the value
> +	     we've already loaded, but we could even load the offset
> +	     after releasing the lock, since it won't change.  Should
> +	     the module be released while another thread references
> +	     one of its TLS variables, that's undefined behavior.  */
> +	  __rtld_lock_unlock_recursive (GL(dl_load_lock));
> +
> +	static_tls:
> +	  ;
> +
>  #if TLS_TCB_AT_TP
> -	  void *p = (char *) THREAD_SELF - the_map->l_tls_offset;
> +	  void *p = (char *) THREAD_SELF - offset;
>  #elif TLS_DTV_AT_TP
> -	  void *p = (char *) THREAD_SELF + the_map->l_tls_offset + TLS_PRE_TCB_SIZE;
> +	  void *p = (char *) THREAD_SELF + offset + TLS_PRE_TCB_SIZE;
>  #else
>  # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
>  #endif
> -	  __rtld_lock_unlock_recursive (GL(dl_load_lock));
>  
>  	  dtv[GET_ADDR_MODULE].pointer.is_static = true;
>  	  dtv[GET_ADDR_MODULE].pointer.val = p;
> diff --git a/nptl/Makefile b/nptl/Makefile
> index 3dd2944..4ffd13c 100644
> --- a/nptl/Makefile
> +++ b/nptl/Makefile
> @@ -242,7 +242,7 @@ tests = tst-typesizes \
>  	tst-basic7 \
>  	tst-kill1 tst-kill2 tst-kill3 tst-kill4 tst-kill5 tst-kill6 \
>  	tst-raise1 \
> -	tst-join1 tst-join2 tst-join3 tst-join4 tst-join5 tst-join6 \
> +	tst-join1 tst-join2 tst-join3 tst-join4 tst-join5 tst-join6 tst-join7 \
>  	tst-detach1 \
>  	tst-eintr1 tst-eintr2 tst-eintr3 tst-eintr4 tst-eintr5 \
>  	tst-tsd1 tst-tsd2 tst-tsd3 tst-tsd4 tst-tsd5 tst-tsd6 \
> @@ -320,7 +320,8 @@ endif
>  modules-names = tst-atfork2mod tst-tls3mod tst-tls4moda tst-tls4modb \
>  		tst-tls5mod tst-tls5moda tst-tls5modb tst-tls5modc \
>  		tst-tls5modd tst-tls5mode tst-tls5modf tst-stack4mod \
> -		tst-_res1mod1 tst-_res1mod2 tst-execstack-mod tst-fini1mod
> +		tst-_res1mod1 tst-_res1mod2 tst-execstack-mod tst-fini1mod \
> +		tst-join7mod
>  extra-test-objs += $(addsuffix .os,$(strip $(modules-names))) tst-cleanup4aux.o
>  test-extras += $(modules-names) tst-cleanup4aux
>  test-modules = $(addprefix $(objpfx),$(addsuffix .so,$(modules-names)))
> @@ -525,6 +526,11 @@ $(objpfx)tst-tls6.out: tst-tls6.sh $(objpfx)tst-tls5 \
>  	$(evaluate-test)
>  endif
>  
> +$(objpfx)tst-join7: $(libdl) $(shared-thread-library)
> +$(objpfx)tst-join7.out: $(objpfx)tst-join7mod.so
> +$(objpfx)tst-join7mod.so: $(shared-thread-library)
> +LDFLAGS-tst-join7mod.so = -Wl,-soname,tst-join7mod.so
> +
>  $(objpfx)tst-dlsym1: $(libdl) $(shared-thread-library)
>  
>  $(objpfx)tst-fini1: $(shared-thread-library) $(objpfx)tst-fini1mod.so
> diff --git a/nptl/tst-join7.c b/nptl/tst-join7.c
> new file mode 100644
> index 0000000..bf6fc76
> --- /dev/null
> +++ b/nptl/tst-join7.c
> @@ -0,0 +1,12 @@
> +#include <dlfcn.h>
> +
> +int
> +do_test (void)
> +{
> +  void *f = dlopen ("tst-join7mod.so", RTLD_NOW | RTLD_GLOBAL);
> +  if (f) dlclose (f); else return 1;
> +  return 0;
> +}
> +
> +#define TEST_FUNCTION do_test ()
> +#include "../test-skeleton.c"
> diff --git a/nptl/tst-join7mod.c b/nptl/tst-join7mod.c
> new file mode 100644
> index 0000000..a8c7bc0
> --- /dev/null
> +++ b/nptl/tst-join7mod.c
> @@ -0,0 +1,29 @@
> +#include <stdio.h>
> +#include <pthread.h>
> +
> +static pthread_t th;
> +static int running = 1;
> +
> +static void *
> +test_run (void *p)
> +{
> +  while (running)

That's a data race with the modification do_end, or isn't it?  Use
atomics, a semaphore (with trywait), or trylock.

> +    fprintf (stderr, "XXX test_run\n");
> +  fprintf (stderr, "XXX test_run FINISHED\n");
> +  return NULL;
> +}
> +
> +static void __attribute__ ((constructor))
> +do_init (void)
> +{
> +  pthread_create (&th, NULL, test_run, NULL);
> +}
> +
> +static void __attribute__ ((destructor))
> +do_end (void)
> +{
> +  running = 0;
> +  fprintf (stderr, "thread_join...\n");
> +  pthread_join (th, NULL);
> +  fprintf (stderr, "thread_join DONE\n");
> +}
> 
>

Follow-Ups:
- Re: [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Alexandre Oliva

References:
- [PR18457] Don't require rtld lock to compute DTV addr for static TLS
  - From: Alexandre Oliva

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]