This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][BZ #13862] Reuse of cached stack can cause bounds overrun of thread DTV


On Thu, Nov 27, 2014 at 8:28 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> On 11/27/2014 09:06 AM, H.J. Lu wrote:
>> Here is the promised patch.  OK for trunk?
>
> Not yet.
>
> My personal opinion is that there has been insufficient detailed
> analysis posted for this patch. For this kind of change I want to
> see the author understand the code paths in question and how the
> call flow gets to the position of failure. Paul has given a hand-waving
> example of the failure without much detail. I would like to see
> sufficient detail like:
> ---
> When calling pthread_create we call allocate_stack which then reuses
> a stack by finding one via get_cached_stack. In get_cached_stack we
> select an appropriately sized stack and in that process get the old
> dtv from the thread, which could be reused if it was the right size.
> Then _dl_allocate_tls_init is called to reinitialize the TLS structures
> for the thread. In _dl_allocate_tls_init the expectation that is violated
> is that dtv should have been calloc'd to a size of GL(dl_tls_max_dtv_idx)
> + DTV_SURPLUS; by allocate_dtv. While this is true, it was done with an
> old value of dl_tls_max_dtv_idx and that value might have changed since
> the stack was last released.
> ---
> This is just an example, you would have to verify that this is actually
> the case.

That is the case.

> Q: A dlclose() may modify GL(dl_tls_max_dtv_idx) concurrently with another
>    thread starting up. Is it safe to read GL(dl_tls_max_dtv_idx) more than
>    once in a function since they may differ? Does reading GL(dl_tls_max_dtv_idx)
>    require an atomic load? We are trying to be explicit about this in glibc.

There is a race between DTV access and dlclose:

https://sourceware.org/ml/libc-alpha/2014-11/msg00469.html

> Q: If GL(dl_tls_max_dtv_idx) may change at any time during thread startup
>    does it make sense to reallocate the dtv? Why not leave it uninitialized
>    and allow _dl_update_slotinfo to reallocate it?

Are you suggesting folding  _dl_allocate_tls_init into _dl_update_slotinfo
and remove _dl_allocate_tls_init?  I can give it a try.

> Q: If you don't want to delay the dtv reallocation to _dl_update_slotinfo,
>    why don't you refactor the reallocation of the dtv into a single function
>    to avoid duplication with the same code in _dl_update_slotinfo?

I will do that.

>> ----
>> From b811f571b5604f84c84d5b96394e6b892e05d14f Mon Sep 17 00:00:00 2001
>> From: "H.J. Lu" <hjl.tools@gmail.com>
>> Date: Thu, 27 Nov 2014 05:42:23 -0800
>> Subject: [PATCH] Reallocate DTV if the current DTV isn't big enough
>>
>> This patch reallocates DTV if the current DTV isn't big enough, using
>> the similar approach in _dl_update_slotinfo.  Tested on X86-64, x32
>> and ia32.
>>
>>       [BZ #13862]
>>       * elf/dl-tls.c (oom): Remove #ifdef SHARED/#endif.
>>       (_dl_static_dtv, _dl_initial_dtv): Moved before ...
>>       (_dl_allocate_tls_init): This.  Reallocate DTV if the current
>>       DTV isn't big enough.
>>       * nptl/Makefile (tests): Add tst-stack4.
>>       (modules-names): Add tst-stack4mod.
>>       ($(objpfx)tst-stack4): New.
>>       (tst-stack4mod.sos): Likewise.
>>       ($(objpfx)tst-stack4.out): Likewise.
>>       ($(tst-stack4mod.sos)): Likewise.
>>       (clean): Likewise.
>>       * nptl/tst-stack4.c: New file.
>>       * nptl/tst-stack4mod.c: Likewise.
>> ---
>>  ChangeLog            |  17 +++++++
>>  elf/dl-tls.c         |  53 ++++++++++++++++++---
>>  nptl/Makefile        |  17 ++++++-
>>  nptl/tst-stack4.c    | 130 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  nptl/tst-stack4mod.c |   9 ++++
>>  5 files changed, 217 insertions(+), 9 deletions(-)
>>  create mode 100644 nptl/tst-stack4.c
>>  create mode 100644 nptl/tst-stack4mod.c
>>
>> diff --git a/ChangeLog b/ChangeLog
>> index 92ffe16..e6474be 100644
>> --- a/ChangeLog
>> +++ b/ChangeLog
>> @@ -1,3 +1,20 @@
>> +2014-11-27  H.J. Lu  <hongjiu.lu@intel.com>
>> +
>> +     [BZ #13862]
>> +     * elf/dl-tls.c (oom): Remove #ifdef SHARED/#endif.
>> +     (_dl_static_dtv, _dl_initial_dtv): Moved before ...
>> +     (_dl_allocate_tls_init): This.  Reallocate DTV if the current
>> +     DTV isn't big enough.
>> +     * nptl/Makefile (tests): Add tst-stack4.
>> +     (modules-names): Add tst-stack4mod.
>> +     ($(objpfx)tst-stack4): New.
>> +     (tst-stack4mod.sos): Likewise.
>> +     ($(objpfx)tst-stack4.out): Likewise.
>> +     ($(tst-stack4mod.sos)): Likewise.
>> +     (clean): Likewise.
>> +     * nptl/tst-stack4.c: New file.
>> +     * nptl/tst-stack4mod.c: Likewise.
>> +
>>  2014-11-27  Stefan Liebler  <stli@linux.vnet.ibm.com>
>>
>>       * nscd/connections.c: Include libc-internal.h because of macro
>> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
>> index 5204fda..3142b16 100644
>> --- a/elf/dl-tls.c
>> +++ b/elf/dl-tls.c
>> @@ -34,14 +34,12 @@
>>
>>
>>  /* Out-of-memory handler.  */
>> -#ifdef SHARED
>>  static void
>>  __attribute__ ((__noreturn__))
>>  oom (void)
>>  {
>>    _dl_fatal_printf ("cannot allocate memory for thread-local data: ABORT\n");
>>  }
>> -#endif
>
> OK.
>
>>
>>  size_t
>> @@ -397,6 +395,11 @@ _dl_allocate_tls_storage (void)
>>  }
>>
>>
>> +#ifndef SHARED
>> +extern dtv_t _dl_static_dtv[];
>> +# define _dl_initial_dtv (&_dl_static_dtv[1])
>> +#endif
>> +
>
> OK.
>
>>  void *
>>  internal_function
>>  _dl_allocate_tls_init (void *result)
>> @@ -410,6 +413,47 @@ _dl_allocate_tls_init (void *result)
>>    size_t total = 0;
>>    size_t maxgen = 0;
>>
>> +  /* Check if the current dtv is big enough.   */
>> +  if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
>> +    {
>> +      /* Reallocate the dtv.  */
>> +      dtv_t *newp;
>> +      size_t newsize = GL(dl_tls_max_dtv_idx) + DTV_SURPLUS;
>
> IMO, access to GL(dl_tls_max_dtv_idx) needs to be annotated as
> an atomic load until someone does the analysis to relax that.
> I'm not commenting on the other users in the function that you
> didn't touch.

There is a race condition. But it is a separate issue.  They should
be fixed together.

>> +      size_t oldsize = dtv[-1].counter;
>> +
>> +      if (dtv == GL(dl_initial_dtv))
>> +     {
>> +       /* This is the initial dtv that was allocated
>> +          during rtld startup using the dl-minimal.c
>> +          malloc instead of the real malloc.  We can't
>> +          free it, we have to abandon the old storage.  */
>> +
>> +       newp = malloc ((2 + newsize) * sizeof (dtv_t));
>> +       if (newp == NULL)
>> +         oom ();
>> +       memcpy (newp, &dtv[-1], (2 + oldsize) * sizeof (dtv_t));
>> +     }
>> +      else
>> +     {
>> +       newp = realloc (&dtv[-1],
>> +                       (2 + newsize) * sizeof (dtv_t));
>> +       if (newp == NULL)
>> +         oom ();
>> +     }
>> +
>> +      newp[0].counter = newsize;
>> +
>> +      /* Clear the newly allocated part.  */
>> +      memset (newp + 2 + oldsize, '\0',
>> +           (newsize - oldsize) * sizeof (dtv_t));
>> +
>> +      /* Install this new dtv in the thread data structures.  */
>> +      INSTALL_DTV (result, newp);
>> +
>> +      /* Point dtv to the generation counter.  */
>> +      dtv = &newp[1];
>> +    }
>> +
>
> The code itself is OK, but it's a copy of what _dl_update_slotinfo
> does. Please refactor into a static inline function e.g. resize_dtv.
>
>>    /* We have to prepare the dtv for all currently loaded modules using
>>       TLS.  For those which are dynamically loaded we add the values
>>       indicating deferred allocation.  */
>> @@ -492,11 +536,6 @@ _dl_allocate_tls (void *mem)
>>  rtld_hidden_def (_dl_allocate_tls)
>>
>>
>> -#ifndef SHARED
>> -extern dtv_t _dl_static_dtv[];
>> -# define _dl_initial_dtv (&_dl_static_dtv[1])
>> -#endif
>> -
>>  void
>>  internal_function
>>  _dl_deallocate_tls (void *tcb, bool dealloc_tcb)
>> diff --git a/nptl/Makefile b/nptl/Makefile
>> index 86a1b0c..ac76596 100644
>> --- a/nptl/Makefile
>> +++ b/nptl/Makefile
>> @@ -254,7 +254,7 @@ tests = tst-typesizes \
>>       tst-exec1 tst-exec2 tst-exec3 tst-exec4 \
>>       tst-exit1 tst-exit2 tst-exit3 \
>>       tst-stdio1 tst-stdio2 \
>> -     tst-stack1 tst-stack2 tst-stack3 tst-pthread-getattr \
>> +     tst-stack1 tst-stack2 tst-stack3 tst-stack4 tst-pthread-getattr \
>
> OK.
>
>>       tst-pthread-attr-affinity \
>>       tst-unload \
>>       tst-dlsym1 \
>> @@ -311,7 +311,7 @@ endif
>>
>>  modules-names = tst-atfork2mod tst-tls3mod tst-tls4moda tst-tls4modb \
>>               tst-tls5mod tst-tls5moda tst-tls5modb tst-tls5modc \
>> -             tst-tls5modd tst-tls5mode tst-tls5modf \
>> +             tst-tls5modd tst-tls5mode tst-tls5modf tst-stack4mod \
>
> OK.
>
>>               tst-_res1mod1 tst-_res1mod2 tst-execstack-mod tst-fini1mod
>>  extra-test-objs += $(addsuffix .os,$(strip $(modules-names))) tst-cleanup4aux.o
>>  test-extras += $(modules-names) tst-cleanup4aux
>> @@ -479,6 +479,19 @@ $(objpfx)tst-stack3-mem.out: $(objpfx)tst-stack3.out
>>       $(evaluate-test)
>>  generated += tst-stack3-mem.out tst-stack3.mtrace
>>
>> +$(objpfx)tst-stack4: $(libdl) $(shared-thread-library)
>> +tst-stack4mod.sos=$(shell for i in 0 1 2 3 4 5 6 7 8 9 10 \
>> +                                11 12 13 14 15 16 17 18 19; do \
>> +                         for j in 0 1 2 3 4 5 6 7 8 9 10 \
>> +                                  11 12 13 14 15 16 17 18 19; do \
>> +                           echo $(objpfx)tst-stack4mod-$$i-$$j.so; \
>> +                         done; done)
>> +$(objpfx)tst-stack4.out: $(tst-stack4mod.sos)
>> +$(tst-stack4mod.sos): $(objpfx)tst-stack4mod.so
>> +     cp -f $< $@
>> +clean:
>> +     rm -f $(tst-stack4mod.sos)
>> +
>
> OK.
>
>>  $(objpfx)tst-cleanup4: $(objpfx)tst-cleanup4aux.o $(shared-thread-library)
>>  $(objpfx)tst-cleanupx4: $(objpfx)tst-cleanup4aux.o $(shared-thread-library)
>>
>> diff --git a/nptl/tst-stack4.c b/nptl/tst-stack4.c
>> new file mode 100644
>> index 0000000..0a0d2f9
>> --- /dev/null
>> +++ b/nptl/tst-stack4.c
>> @@ -0,0 +1,130 @@
>
> Missing first line explaining test.

I will add it.

> Missing license.

What is the license policy on testcases?

>> +#include <stdio.h>
>> +#include <stdint.h>
>> +#include <dlfcn.h>
>> +#include <assert.h>
>> +#include <pthread.h>
>> +
>> +#define DSO_SHARED_FILES 20
>> +#define DSO_OPEN_THREADS 20
>> +#define DSO_EXEC_THREADS 2
>
> Why these particular constants? If you chose them because
> they happened to work on a particular system, please comment
> that.

They came from:

https://sourceware.org/bugzilla/attachment.cgi?id=6290

>> +
>> +pthread_mutex_t g_lock;
>> +
>
> Describe what the mutex protects in a comment.

I will add a few words.

>> +typedef void (*function) (void);
>> +
>> +void *
>> +dso_invoke(void *dso_fun)
>> +{
>> +  function *fun_vec = (function *) dso_fun;
>> +  int dso;
>> +
>> +  for (dso = 0; dso < DSO_SHARED_FILES; dso++)
>> +    (*fun_vec[dso]) ();
>> +
>> +  pthread_exit (NULL);
>> +}
>> +
>> +void *
>> +dso_process (void * p)
>> +{
>> +  void *handle[DSO_SHARED_FILES];
>> +  function fun_vec[DSO_SHARED_FILES];
>> +  char dso_path[DSO_SHARED_FILES][100];
>> +  int dso;
>> +  uintptr_t t = (uintptr_t) p;
>> +
>> +  // Open dso's and get a function.
>
> Use /* */.

We have been using // for comments in glibc for a while.  Why not here?

>
>> +  for (dso = 0; dso < DSO_SHARED_FILES; dso++)
>> +    {
>> +      sprintf (dso_path[dso], "tst-stack4mod-%i-%i.so", t, dso);
>> +
>> +      pthread_mutex_lock (&g_lock);
>> +
>> +      handle[dso] = dlopen (dso_path[dso], RTLD_NOW);
>> +      assert (handle[dso]);
>> +
>> +      fun_vec[dso] = (function) dlsym (handle[dso], "function");
>> +      assert (fun_vec[dso]);
>> +
>> +      pthread_mutex_unlock (&g_lock);
>> +    }
>> +
>> +  // Spawn workers
>
> Use /* */
>
>> +  pthread_t thread[DSO_EXEC_THREADS];
>> +  int i, ret;
>> +  uintptr_t result = 0;
>> +  for (i = 0; i < DSO_EXEC_THREADS; i++)
>> +    {
>> +      pthread_mutex_lock (&g_lock);
>> +      ret = pthread_create (&thread[i], NULL, dso_invoke, (void *) fun_vec);
>> +      if (ret != 0)
>> +     {
>> +       printf ("pthread_create failed: %d\n", ret);
>> +       result = 1;
>> +     }
>> +      pthread_mutex_unlock (&g_lock);
>> +    }
>> +
>> +  if (!result)
>> +    for (i = 0; i < DSO_EXEC_THREADS; i++)
>> +      {
>> +     ret = pthread_join (thread[i], NULL);
>> +     if (ret != 0)
>> +       {
>> +         printf ("pthread_join failed: %d\n", ret);
>> +         result = 1;
>> +       }
>> +      }
>> +
>> +  // Close all dso's
>
> Use /* */
>
>> +  for (dso = 0; dso < DSO_SHARED_FILES; dso++)
>> +    {
>> +      pthread_mutex_lock (&g_lock);
>> +      dlclose (handle[dso]);
>> +      pthread_mutex_unlock (&g_lock);
>> +    }
>> +
>> +  // Exit
>> +  pthread_exit ((void *) result);
>> +}
>> +
>> +
>
> This test needs a verbose, minimum one paragraph, comment explaining
> what the test is testing.
>
>> +int
>> +main()
>> +{
>> +  pthread_t thread[DSO_OPEN_THREADS];
>> +  int i,j;
>> +  int ret;
>> +  int result = 0;
>> +
>> +  pthread_mutex_init (&g_lock, NULL);
>> +
>> +  for (j = 0; j < 100; j++)
>
> Why 100 times? Please comment.
>
>> +    {
>> +      for (i = 0; i < DSO_OPEN_THREADS; i++)
>> +     {
>> +       ret = pthread_create (&thread[i], NULL, dso_process,
>> +                             (void *) (uintptr_t) i);
>> +       if (ret != 0)
>> +         {
>> +           printf ("pthread_create failed: %d\n", ret);
>> +           result = 1;
>> +         }
>> +     }
>> +
>> +      if (result)
>> +     break;
>> +
>> +      for (i = 0; i < DSO_OPEN_THREADS; i++)
>> +     {
>> +       ret = pthread_join (thread[i], NULL);
>> +       if (ret != 0)
>> +         {
>> +           printf ("pthread_join failed: %d\n", ret);
>> +           result = 1;
>> +         }
>> +     }
>> +    }
>> +
>> +  return result;
>> +}
>> diff --git a/nptl/tst-stack4mod.c b/nptl/tst-stack4mod.c
>> new file mode 100644
>> index 0000000..b8e800f
>> --- /dev/null
>> +++ b/nptl/tst-stack4mod.c
>> @@ -0,0 +1,9 @@
>
> Please give a one line description at the top of this file.
>
>> +__thread int var[256] __attribute__ ((visibility ("hidden"))) = {0};
>> +
>> +void
>> +function (void)
>> +{
>> +  int i;
>> +  for (i = 0; i < 256; i++)
>
> Why 256 times? Please comment.
>
>> +    var[i] = i;
>> +}
>>

I don't have answers for any particular values in this
testcase.



-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]