This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Async signal safe TLS accesses


Andrew,

I know you have a v2 out, but I think some of my comments here still
apply. So I'm sending this out.

I'm still trying to come to grips with the exact subtleties of the
changes here and reviewing the details. This review isn't entirely
technical and I hope that doesn't frustrate you.

On 09/23/2013 07:01 PM, Andrew Hunter wrote:
> TLS accesses from initial-exec variables are async-signal-safe.  Even
> dynamic-type accesses from shared objects loaded by ld.so at startup
> are.  But dynamic accesses from dlopen()ed objects are not, which
> means a lot of trouble for any sort of per-thread state we want to
> use from signal handlers since we can't rely on always having
> initial-exec.  Make all TLS access always signal safe.
> 
> Doing this has a few components to it:
> 
>  * We introduce a set of symbols symbol_safe_{malloc,free,memalign,&c}.
>    They do what it says on the box, but guarantee async-signal-safety.
>    We provide a minimal mmap-based implementation in ld.so; anyone can
>    override them more efficiently.  (This may prove useful elsewhere.)
> 
>  * We use these throughout dl-tls.c in paths reachable from tls_get_addr
>    (and, importantly, for allocations on other paths that might be
>    freed/realloced from tls_get_addr.)
> 
>  * tls_get_addr synchronizes with dlopen() by dl_load_lock; this lock
>    is reentrant, but not, alas, signal safe.  Replace this with simple
>    CAS based synchronization.
> 
>  * Use signal masking in the slow path of tls_get_addr to prevent
>    reentrant TLS initialization.  The most complicated part here is
>    ensuring a dlopen which forces static TLS (and thus updates all
>    threads' DTVs) does not interfere with this.

While I review this patch could you please read through the
contribution checklist?

https://sourceware.org/glibc/wiki/Contribution%20checklist

The checklist ensures that everyone submitting patches is on
the same page.

> ---
>  elf/Versions               |   5 ++
>  elf/dl-misc.c              | 184 +++++++++++++++++++++++++++++++++++++++++++++
>  elf/dl-open.c              |   5 +-
>  elf/dl-reloc.c             |  42 +++++++++--
>  elf/dl-tls.c               | 138 +++++++++++++++++++++-------------
>  nptl/allocatestack.c       |  16 ++--
>  sysdeps/generic/ldsodefs.h |  17 +++++
>  7 files changed, 340 insertions(+), 67 deletions(-)
> 
> diff --git a/elf/Versions b/elf/Versions
> index 2383992..f14e5d0 100644
> --- a/elf/Versions
> +++ b/elf/Versions
> @@ -53,6 +53,7 @@ ld {
>      _dl_allocate_tls; _dl_allocate_tls_init;
>      _dl_argv; _dl_find_dso_for_object; _dl_get_tls_static_info;
>      _dl_deallocate_tls; _dl_make_stack_executable; _dl_out_of_memory;
> +    _dl_clear_dtv;

OK, used by libpthread...

>      _dl_rtld_di_serinfo; _dl_starting_up; _dl_tls_setup;
>      _rtld_global; _rtld_global_ro;
>  
> @@ -61,5 +62,9 @@ ld {
>  
>      # Pointer protection.
>      __pointer_chk_guard;
> +
> +    # for signal safe TLS
> +    signal_safe_malloc; signal_safe_free; signal_safe_memalign;
> +    signal_safe_realloc; signal_safe_calloc;

As discussed we aren't going to have a public API for now.

>    }
>  }
> diff --git a/elf/dl-misc.c b/elf/dl-misc.c
> index 5fc13a4..312700c 100644
> --- a/elf/dl-misc.c
> +++ b/elf/dl-misc.c
> @@ -19,8 +19,10 @@
>  #include <assert.h>
>  #include <fcntl.h>
>  #include <ldsodefs.h>
> +#include <libc-symbols.h>
>  #include <limits.h>
>  #include <link.h>
> +#include <signal.h>
>  #include <stdarg.h>
>  #include <stdlib.h>
>  #include <string.h>
> @@ -364,3 +366,185 @@ _dl_higher_prime_number (unsigned long int n)
>  
>    return *low;
>  }
> +
> +
> +/* mask every signal, returning the previous sigmask in <old>. */

Full sentence please, two spaces after period.

Just use `old' when talking about the variable not `<old>'.

> +void
> +internal_function
> +_dl_mask_all_signals (sigset_t *old)
> +{
> +  sigset_t new;
> +  sigfillset(&new);

Space after function name e.g. sigfillset (&new);

Please familiarize yourself with the glibc coding style and conventions:
https://sourceware.org/glibc/wiki/Style_and_Conventions

I'll point them out as I see them.

> +  int ret;
> +
> +  /* So...hmmm. This function just serves as a replacement to pthread_sigmask,
> +     which isn't available due to linking problems. Two concerns.
> +     - pthread_sigmask removes a few internal signals from the
> +       mask -- do we care?
> +
> +       I don't think so.  We only turn off signals for bounded time inside
> +       dynamic linker functions.  Pthread's concerned with a user permanently
> +       turning off some signals.
> +
> +     - which syscall do we use? rt_sigprocmask is slightly more correct,
> +       but they're very nearly equivalent and rt_ may not be globally available.
> +
> +       I think just rt_sigprocmask. I don't think we support any kernel where
> +       that's not present.
> +
> +     It's unfortunate there's no simpler solution than duplicating sigmask.

Could you please clean this up to talk about the implementation as it was
chosen.

e.g.
This function serves as a replacement to pthread_sigmask,
which isn't available from within the dynamic linker since it
would require linking with libpthread. We duplicate some of
the functionality here to avoid requiring libpthread.
This isn't quite identical to pthread_sigmask in that we do
not mask internal signals used for cancellation and setxid
handling. This disables asyncrhonous cancellation for the
duration the signals are disabled, but it's a small window,
and prevents any problems with the use of TLS variables in
the signal handlers that would have executed.

I don't think you need to talk about rt_sigprocmask, that's
the right function to use IMO.

You also don't need to mention that it's unfortunate that
we have duplicated code, you already provided a valid reason
for it.

> +  */
> +
> +  /* it's very important we don't touch errno here--that's TLS and who
> +     knows what might happen then! */

Full sentence please. We can guess what will happen, it will trigger another
call to tls_get_addr recursively and eventually run out of stack.

> +  INTERNAL_SYSCALL_DECL (err);
> +
> +  ret = INTERNAL_SYSCALL (rt_sigprocmask, err, 4, SIG_SETMASK, &new, old,
> +                          _NSIG / 8);
> +
> +  assert (ret == 0);
> +}

OK.

> +
> +/* Return sigmask to what it was before a call to _dl_mask_all_signals. */
> +void
> +internal_function
> +_dl_unmask_signals (sigset_t *old)
> +{
> +  int ret;
> +  INTERNAL_SYSCALL_DECL (err);
> +
> +  ret = INTERNAL_SYSCALL (rt_sigprocmask, err, 4, SIG_SETMASK, old, NULL,
> +                          _NSIG / 8);
> +
> +  assert (ret == 0);
> +}

OK.

> +
> +/* To support accessing TLS variables from signal handlers, we need an
> +   async signal safe memory allocator.  These routines are never
> +   themselves invoked reentrantly (all calls to them are surrounded by
> +   signal masks) but may be invoked concurrently from many threads.
> +   The current implementation is not particularly performant nor space
> +   efficient, but it will be used rarely (and only in binaries that use
> +   dlopen.) The API matches that of malloc() and friends. */

Two spaces after period.

> +
> +struct signal_safe_allocator_header {
> +  size_t size;
> +  void *start;
> +};

Braces in column 1. See GNU Coding standard.

e.g.
struct foo
{
  int a, b;
}

> +
> +void * weak_function
> +signal_safe_memalign (size_t boundary, size_t size)
> +{
> +  struct signal_safe_allocator_header *header;
> +  if (boundary < sizeof (*header))
> +    boundary = sizeof (*header);
> +
> +  /* power of two */
> +  if (boundary & (boundary - 1))

No boolean coercion please.

> +    return NULL;
> +
> +  size_t pg = GLRO(dl_pagesize);
> +  size_t padded_size;
> +  if (boundary <= pg)
> +    {
> +      /* easy: we'll get a pointer certainly aligned to boundary, so
> +         just add one more boundary-sized chunk to hold the header. */

Full sentences, two spaces after period.

> +      padded_size = roundup (size, boundary) + boundary;
> +    }
> +  else
> +    {
> +      /* if we want K pages aligned to a J-page boundary, K+J+1 pages
> +         contains at least one such region that isn't directly at the start
> +         (so we can place the header.)  This is wasteful, but you're the one
> +         who wanted 64K-aligned TLS. */

Likewise.

> +      padded_size = roundup (size, pg) + boundary + pg;
> +    }
> +
> +
> +  size_t actual_size = roundup (padded_size, pg);
> +  void *actual = mmap (NULL, actual_size, PROT_READ | PROT_WRITE,
> +                       MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> +  if (actual == MAP_FAILED)
> +    return NULL;
> +
> +  if (boundary <= pg)
> +    {
> +      header = actual + boundary - sizeof (*header);
> +    }
> +  else
> +    {
> +      intptr_t actual_pg = ((intptr_t)actual) / pg;

Space after cast: e.g. (intptr_t) actual...

> +      intptr_t boundary_pg = boundary / pg;
> +      intptr_t start_pg = actual_pg + boundary_pg;
> +      start_pg -= start_pg % boundary_pg;
> +      if (start_pg > (actual_pg + 1))
> +        {
> +          int ret = munmap (actual, (start_pg - actual_pg - 1) * pg);
> +          assert (ret == 0);
> +          actual = (void *)((start_pg - 1) * pg);

Likewise.

> +        }
> +      void *start = (void *)(start_pg * pg);

Likeise.

> +      header = start - sizeof(*header);
> +
> +    }
> +  header->size = actual_size;
> +  header->start = actual;
> +  void *ptr = header;
> +  ptr += sizeof (*header);
> +  if (((intptr_t)ptr) % boundary != 0)

Likewise.

> +    _dl_fatal_printf ("signal_safe_memalign produced incorrect alignment\n");
> +  return ptr;
> +}
> +
> +void * weak_function
> +signal_safe_malloc (size_t size)
> +{
> +  return signal_safe_memalign(1, size);
> +}

OK.

> +
> +void weak_function
> +signal_safe_free (void *ptr)
> +{
> +  if (ptr == NULL)
> +    return;
> +
> +  struct signal_safe_allocator_header *header = ptr - sizeof (*header);
> +  int ret = munmap (header->start, header->size);
> +
> +  assert (ret == 0);
> +}

OK.

> +
> +void * weak_function
> +signal_safe_realloc (void *ptr, size_t size)
> +{
> +  if (size == 0)
> +    {
> +      signal_safe_free (ptr);
> +      return NULL;
> +    }

OK.

> +  if (ptr == NULL)
> +    return signal_safe_malloc (size);

OK.

> +
> +  struct signal_safe_allocator_header *header = ptr - sizeof (*header);
> +  size_t old_size = header->size;
> +  if (old_size - sizeof (*header) >= size)
> +    return ptr;
> +
> +  void *new_ptr = signal_safe_malloc (size);
> +  if (new_ptr == NULL)
> +    return NULL;
> +
> +  memcpy (new_ptr, ptr, old_size);
> +  signal_safe_free (ptr);
> +
> +  return new_ptr;
> +}

OK.

> +
> +void * weak_function
> +signal_safe_calloc (size_t nmemb, size_t size)
> +{
> +  void *ptr = signal_safe_malloc(nmemb * size);
> +  if (ptr == NULL)
> +    return NULL;
> +  return memset(ptr, 0, nmemb * size);

OK.

> +}
> diff --git a/elf/dl-open.c b/elf/dl-open.c
> index 1403c8c..ccecf11 100644
> --- a/elf/dl-open.c
> +++ b/elf/dl-open.c
> @@ -548,7 +548,10 @@ cannot load any more object with static TLS"));
>  	     generation of the DSO we are allocating data for.  */
>  	  _dl_update_slotinfo (imap->l_tls_modid);
>  #endif
> -
> +          /* We do this iteration under a signal mask in dl-reloc; why not
> +             here?  Because these symbols are new and dlopen hasn't
> +             returned yet.  So we can't possibly be racing with a TLS
> +             access to them from another thread. */
>  	  GL(dl_init_static_tls) (imap);
>  	  assert (imap->l_need_tls_init == 0);
>  	}
> diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c
> index 5c54310..f9493d7 100644
> --- a/elf/dl-reloc.c
> +++ b/elf/dl-reloc.c
> @@ -16,8 +16,10 @@
>     License along with the GNU C Library; if not, see
>     <http://www.gnu.org/licenses/>.  */
>  
> +#include <atomic.h>
>  #include <errno.h>
>  #include <libintl.h>
> +#include <signal.h>
>  #include <stdlib.h>
>  #include <unistd.h>
>  #include <ldsodefs.h>
> @@ -70,8 +72,6 @@ _dl_try_allocate_static_tls (struct link_map *map)
>  
>    size_t offset = GL(dl_tls_static_used) + (freebytes - n * map->l_tls_align
>  					    - map->l_tls_firstbyte_offset);
> -
> -  map->l_tls_offset = GL(dl_tls_static_used) = offset;
>  #elif TLS_DTV_AT_TP
>    /* dl_tls_static_used includes the TCB at the beginning.  */
>    size_t offset = (((GL(dl_tls_static_used)
> @@ -83,7 +83,27 @@ _dl_try_allocate_static_tls (struct link_map *map)
>    if (used > GL(dl_tls_static_size))
>      goto fail;
>  
> -  map->l_tls_offset = offset;
> +#else
> +# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
> +#endif
> +  /* We've computed the new value we want, now try to install it. */

Two spaces after period.

> +  ptrdiff_t val;
> +  while ((val = map->l_tls_offset) == NO_TLS_OFFSET)
> +    {
> +      atomic_compare_and_exchange_bool_acq(&map->l_tls_offset,

Space after function name.

> +                                           offset,
> +                                           NO_TLS_OFFSET);
> +    }
> +  if (val != offset)
> +    {
> +      /* Lost a race to a TLS access in another thread. Too bad, nothing
> +         we can do here. */

Two spaces after period.

> +      goto fail;
> +    }
> +  /* We installed the value; now update the globals. */
> +#if TLS_TCB_AT_TP
> +  GL(dl_tls_static_used) = offset;
> +#elif TLS_DTV_AT_TP
>    map->l_tls_firstbyte_offset = GL(dl_tls_static_used);
>    GL(dl_tls_static_used) = used;
>  #else
> @@ -114,10 +134,20 @@ void
>  internal_function __attribute_noinline__
>  _dl_allocate_static_tls (struct link_map *map)
>  {
> -  if (map->l_tls_offset == FORCED_DYNAMIC_TLS_OFFSET
> -      || _dl_try_allocate_static_tls (map))
> +  /* We wrap this in a signal mask because it has to iterate all
> +     threads (including this one) and update this map's TLS entry.  A

Move `A' down to next line.

> +     handler accessing TLS would try to do the same update and
> +     break. */
> +  sigset_t old;
> +  _dl_mask_all_signals (&old);
> +  int err = -1;
> +  if (map->l_tls_offset != FORCED_DYNAMIC_TLS_OFFSET)
> +    err = _dl_try_allocate_static_tls (map);
> +
> +  _dl_unmask_signals (&old);
> +  if (err != 0)
>      {
> -      _dl_signal_error (0, map->l_name, NULL, N_("\
> +       _dl_signal_error (0, map->l_name, NULL, N_("\

No spurious fixes please. Send those in another patch.

>  cannot allocate memory in static TLS block"));
>      }
>  }
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index 576d9a1..a286420 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -17,6 +17,7 @@
>     <http://www.gnu.org/licenses/>.  */
>  
>  #include <assert.h>
> +#include <atomic.h>
>  #include <errno.h>
>  #include <libintl.h>
>  #include <signal.h>
> @@ -43,7 +44,6 @@ oom (void)
>  }
>  #endif
>  
> -

Likewise.

>  size_t
>  internal_function
>  _dl_next_tls_modid (void)
> @@ -293,7 +293,7 @@ allocate_dtv (void *result)
>       initial set of modules.  This should avoid in most cases expansions
>       of the dtv.  */
>    dtv_length = GL(dl_tls_max_dtv_idx) + DTV_SURPLUS;
> -  dtv = calloc (dtv_length + 2, sizeof (dtv_t));
> +  dtv = signal_safe_calloc (dtv_length + 2, sizeof (dtv_t));

OK.

>    if (dtv != NULL)
>      {
>        /* This is the initial length of the dtv.  */
> @@ -463,6 +463,18 @@ _dl_allocate_tls (void *mem)
>  }
>  rtld_hidden_def (_dl_allocate_tls)
>  
> +void
> +internal_function
> +_dl_clear_dtv (dtv_t *dtv)
> +{
> +  for (size_t cnt = 0; cnt < dtv[-1].counter; ++cnt)
> +    if (! dtv[1 + cnt].pointer.is_static
> +	&& dtv[1 + cnt].pointer.val != TLS_DTV_UNALLOCATED)
> +      signal_safe_free (dtv[1 + cnt].pointer.val);
> +  memset (dtv, '\0', (dtv[-1].counter + 1) * sizeof (dtv_t));
> +}

OK.

> +
> +rtld_hidden_def (_dl_clear_dtv)
>  
>  #ifndef SHARED
>  extern dtv_t _dl_static_dtv[];
> @@ -479,11 +491,11 @@ _dl_deallocate_tls (void *tcb, bool dealloc_tcb)
>    for (size_t cnt = 0; cnt < dtv[-1].counter; ++cnt)
>      if (! dtv[1 + cnt].pointer.is_static
>  	&& dtv[1 + cnt].pointer.val != TLS_DTV_UNALLOCATED)
> -      free (dtv[1 + cnt].pointer.val);
> +      signal_safe_free (dtv[1 + cnt].pointer.val);
>  
>    /* The array starts with dtv[-1].  */
>    if (dtv != GL(dl_initial_dtv))
> -    free (dtv - 1);
> +    signal_safe_free (dtv - 1);
>  

OK.

>    if (dealloc_tcb)
>      {
> @@ -521,20 +533,21 @@ rtld_hidden_def (_dl_deallocate_tls)
>  # endif
>  
>  
> -static void *
> -allocate_and_init (struct link_map *map)
> +static void
> +allocate_and_init (dtv_t *dtv, struct link_map *map)
>  {
>    void *newp;
> -
> -  newp = __libc_memalign (map->l_tls_align, map->l_tls_blocksize);
> +  newp = signal_safe_memalign (map->l_tls_align, map->l_tls_blocksize);

OK.

>    if (newp == NULL)
>      oom ();
>  
> -  /* Initialize the memory.  */
> +  /* Initialize the memory. Since this is our thread's space, we are
> +     under a signal mask, and no one has touched this section before,
> +     we can safely just overwrite whatever's there. */
>    memset (__mempcpy (newp, map->l_tls_initimage, map->l_tls_initimage_size),
>  	  '\0', map->l_tls_blocksize - map->l_tls_initimage_size);
>  
> -  return newp;
> +  dtv->pointer.val = newp;

OK.

>  }
>  
>  
> @@ -576,7 +589,15 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  	 the entry we need.  */
>        size_t new_gen = listp->slotinfo[idx].gen;
>        size_t total = 0;
> -
> +      int ret;
> +      sigset_t old;
> +      _dl_mask_all_signals (&old);
> +      /* We use the signal mask as a lock against reentrancy here.
> +         Check that a signal taken before the lock didn't already
> +         update us. */
> +      dtv = THREAD_DTV ();
> +      if (dtv[0].counter >= listp->slotinfo[idx].gen)
> +        goto out;
>        /* We have to look through the entire dtv slotinfo list.  */
>        listp =  GL(dl_tls_dtv_slotinfo_list);
>        do
> @@ -596,25 +617,27 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  	      if (gen <= dtv[0].counter)
>  		continue;
>  
> +	      size_t modid = total + cnt;
> +
>  	      /* If there is no map this means the entry is empty.  */
>  	      struct link_map *map = listp->slotinfo[cnt].map;
>  	      if (map == NULL)
>  		{
>  		  /* If this modid was used at some point the memory
>  		     might still be allocated.  */
> -		  if (! dtv[total + cnt].pointer.is_static
> -		      && dtv[total + cnt].pointer.val != TLS_DTV_UNALLOCATED)
> +		  if (dtv[-1].counter >= modid
> +                      && !dtv[modid].pointer.is_static
> +		      && dtv[modid].pointer.val != TLS_DTV_UNALLOCATED)
>  		    {
> -		      free (dtv[total + cnt].pointer.val);
> -		      dtv[total + cnt].pointer.val = TLS_DTV_UNALLOCATED;
> +		      signal_safe_free (dtv[modid].pointer.val);
> +		      dtv[modid].pointer.val = TLS_DTV_UNALLOCATED;
>  		    }
>  
>  		  continue;
>  		}
>  
> +	      assert (modid == map->l_tls_modid);
>  	      /* Check whether the current dtv array is large enough.  */
> -	      size_t modid = map->l_tls_modid;
> -	      assert (total + cnt == modid);
>  	      if (dtv[-1].counter < modid)
>  		{
>  		  /* Reallocate the dtv.  */
> @@ -628,17 +651,17 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  		    {
>  		      /* This is the initial dtv that was allocated
>  			 during rtld startup using the dl-minimal.c
> -			 malloc instead of the real malloc.  We can't
> +			 malloc instead of the real allocator.  We can't
>  			 free it, we have to abandon the old storage.  */
>  
> -		      newp = malloc ((2 + newsize) * sizeof (dtv_t));
> +		      newp = signal_safe_malloc ((2 + newsize) * sizeof (dtv_t));
>  		      if (newp == NULL)
>  			oom ();
>  		      memcpy (newp, &dtv[-1], (2 + oldsize) * sizeof (dtv_t));
>  		    }
>  		  else
>  		    {
> -		      newp = realloc (&dtv[-1],
> +		      newp = signal_safe_realloc (&dtv[-1],
>  				      (2 + newsize) * sizeof (dtv_t));
>  		      if (newp == NULL)
>  			oom ();
> @@ -668,7 +691,7 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  		   deallocate even if it is this dtv entry we are
>  		   supposed to load.  The reason is that we call
>  		   memalign and not malloc.  */
> -		free (dtv[modid].pointer.val);
> +		signal_safe_free (dtv[modid].pointer.val);
>  
>  	      /* This module is loaded dynamically- We defer memory
>  		 allocation.  */
> @@ -685,6 +708,8 @@ _dl_update_slotinfo (unsigned long int req_modid)
>  
>        /* This will be the new maximum generation counter.  */
>        dtv[0].counter = new_gen;
> +   out:
> +      _dl_unmask_signals (&old);

OK.

>      }
>  
>    return the_map;
> @@ -710,39 +735,50 @@ tls_get_addr_tail (GET_ADDR_ARGS, dtv_t *dtv, struct link_map *the_map)
>  
>        the_map = listp->slotinfo[idx].map;
>      }
> -
> - again:
> -  /* Make sure that, if a dlopen running in parallel forces the
> -     variable into static storage, we'll wait until the address in the
> -     static TLS block is set up, and use that.  If we're undecided
> -     yet, make sure we make the decision holding the lock as well.  */
> -  if (__builtin_expect (the_map->l_tls_offset
> -			!= FORCED_DYNAMIC_TLS_OFFSET, 0))
> +  sigset_t old;
> +  _dl_mask_all_signals (&old);
> +
> +  /* as with update_slotinfo, we use the sigmask as a check against
> +     reentrancy. */

Full sentence and two spaces after period.

> +  if (dtv[GET_ADDR_MODULE].pointer.val != TLS_DTV_UNALLOCATED)
> +    goto out;
> +
> +  /* Synchronize against a parallel dlopen() forcing this variable
> +     into static storage. If that happens, we have to be more careful
> +     about initializing the area, as that dlopen() will be iterating
> +     the threads to do so itself. */

Two spaces after period.

> +  ptrdiff_t offset;
> +  while ((offset = the_map->l_tls_offset) == NO_TLS_OFFSET)
>      {
> -      __rtld_lock_lock_recursive (GL(dl_load_lock));
> -      if (__builtin_expect (the_map->l_tls_offset == NO_TLS_OFFSET, 1))
> -	{
> -	  the_map->l_tls_offset = FORCED_DYNAMIC_TLS_OFFSET;
> -	  __rtld_lock_unlock_recursive (GL(dl_load_lock));
> -	}
> -      else
> -	{
> -	  __rtld_lock_unlock_recursive (GL(dl_load_lock));
> -	  if (__builtin_expect (the_map->l_tls_offset
> -				!= FORCED_DYNAMIC_TLS_OFFSET, 1))
> -	    {
> -	      void *p = dtv[GET_ADDR_MODULE].pointer.val;
> -	      if (__builtin_expect (p == TLS_DTV_UNALLOCATED, 0))
> -		goto again;
> -
> -	      return (char *) p + GET_ADDR_OFFSET;
> -	    }
> -	}
> +      atomic_compare_and_exchange_bool_acq(&the_map->l_tls_offset,
> +                                           FORCED_DYNAMIC_TLS_OFFSET,
> +                                           NO_TLS_OFFSET);
> +    }

Is there a guarantee here that we make forward progress?

> +  if (offset == FORCED_DYNAMIC_TLS_OFFSET)
> +    {
> +      allocate_and_init(&dtv[GET_ADDR_MODULE], the_map);
>      }
> -  void *p = dtv[GET_ADDR_MODULE].pointer.val = allocate_and_init (the_map);
> -  dtv[GET_ADDR_MODULE].pointer.is_static = false;
> +  else
> +    {
> +      void **pp = &dtv[GET_ADDR_MODULE].pointer.val;

This kind of indefinite spinning worries me but I see no other
way around this situation. As long as we can convince ourselves
that we will always make forward progress then I'm happy. I need
to do some more review to get to that happy state.

> +      while (atomic_forced_read (*pp) == TLS_DTV_UNALLOCATED)
> +        {
> +          /* for lack of a better (safe) thing to do, just spin.
> +            Someone else (not us; it's done under a signal mask) set
> +            this map to a static TLS offset, and they'll iterate all
> +            threads to initialize it.  They'll eventually set
> +            is_static=true, at which point we know they've fully
> +            completed initialization. */
> +          atomic_delay ();
> +        }
> +      /* make sure we've picked up their initialization of the actual block. */

The comment should detail what the barrier prevents an in detail.

I can just see myself debugging this on POWER8...

> +      atomic_read_barrier ();
> +    }
> +out:
> +  assert (dtv[GET_ADDR_MODULE].pointer.val != TLS_DTV_UNALLOCATED);
> +  _dl_unmask_signals (&old);
>  
> -  return (char *) p + GET_ADDR_OFFSET;
> +  return (char *) dtv[GET_ADDR_MODULE].pointer.val + GET_ADDR_OFFSET;
>  }

OK.

>  
>  
> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index 1e0fe1f..e8709d1 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c
> @@ -242,11 +242,7 @@ get_cached_stack (size_t *sizep, void **memp)
>  
>    /* Clear the DTV.  */
>    dtv_t *dtv = GET_DTV (TLS_TPADJ (result));
> -  for (size_t cnt = 0; cnt < dtv[-1].counter; ++cnt)
> -    if (! dtv[1 + cnt].pointer.is_static
> -	&& dtv[1 + cnt].pointer.val != TLS_DTV_UNALLOCATED)
> -      free (dtv[1 + cnt].pointer.val);
> -  memset (dtv, '\0', (dtv[-1].counter + 1) * sizeof (dtv_t));
> +  _dl_clear_dtv (dtv);

OK.

I'd forgoteen entirely that get_cached_stack touches the dtv...

>  
>    /* Re-initialize the TLS.  */
>    _dl_allocate_tls_init (TLS_TPADJ (result));
> @@ -1177,13 +1173,15 @@ init_one_static_tls (struct pthread *curp, struct link_map *map)
>  #  error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
>  # endif
>  
> -  /* Fill in the DTV slot so that a later LD/GD access will find it.  */
> -  dtv[map->l_tls_modid].pointer.val = dest;
> -  dtv[map->l_tls_modid].pointer.is_static = true;
> -
>    /* Initialize the memory.  */
>    memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size),
>  	  '\0', map->l_tls_blocksize - map->l_tls_initimage_size);
> +
> +  /* Fill in the DTV slot so that a later LD/GD access will find it.  */
> +  dtv[map->l_tls_modid].pointer.is_static = true;
> +  atomic_write_barrier();

This needs a detailed comment or someone will move this by accident.

> +  dtv[map->l_tls_modid].pointer.val = dest;
> +
>  }
>  
>  void
> diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
> index e7b0516..14c8ddc 100644
> --- a/sysdeps/generic/ldsodefs.h
> +++ b/sysdeps/generic/ldsodefs.h
> @@ -233,6 +233,11 @@ extern int _dl_name_match_p (const char *__name, const struct link_map *__map)
>  extern unsigned long int _dl_higher_prime_number (unsigned long int n)
>       internal_function;
>  
> +/* mask every signal, returning the previous sigmask in <old>. */

Full sentence and two spaces after period.

> +extern void _dl_mask_all_signals (sigset_t *old) internal_function;
> +/* undo _dl_mask_all_signals */

Full sentence and two spaces after period.

> +extern void _dl_unmask_signals (sigset_t *old) internal_function;
> +
>  /* Function used as argument for `_dl_receive_error' function.  The
>     arguments are the error code, error string, and the objname the
>     error occurred in.  */
> @@ -983,6 +988,18 @@ extern void *_dl_allocate_tls_storage (void)
>  extern void *_dl_allocate_tls_init (void *) internal_function;
>  rtld_hidden_proto (_dl_allocate_tls_init)
>  
> +extern void *signal_safe_memalign (size_t boundary, size_t size);
> +extern void *signal_safe_malloc (size_t size);
> +extern void signal_safe_free (void *ptr);
> +extern void *signal_safe_realloc (void *ptr, size_t size);
> +extern void *signal_safe_calloc (size_t nmemb, size_t size);

OK.

> +
> +
> +/* remove all allocated dynamic TLS regions from a DTV
> +   for reuse by new thread */

Full sentence and two spaces after period.

> +extern void _dl_clear_dtv (dtv_t *dtv) internal_function;
> +rtld_hidden_proto (_dl_clear_dtv)
> +
>  /* Deallocate memory allocated with _dl_allocate_tls.  */
>  extern void _dl_deallocate_tls (void *tcb, bool dealloc_tcb) internal_function;
>  rtld_hidden_proto (_dl_deallocate_tls)
> 

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]