This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [patch] Fix for bz14333 -- race between atexit() and exit()

From: Carlos O'Donell <carlos at redhat dot com>
To: Paul Pluzhnikov <ppluzhnikov at google dot com>, Torvald Riegel <triegel at redhat dot com>
Cc: Andreas Schwab <schwab at suse dot de>, Joseph Myers <joseph at codesourcery dot com>, GLIBC Devel <libc-alpha at sourceware dot org>, Ricky Zhou <rickyz at google dot com>
Date: Tue, 19 Sep 2017 15:32:21 -0600
Subject: Re: [patch] Fix for bz14333 -- race between atexit() and exit()
Authentication-results: sourceware.org; auth=none
References: <CALoOobNCZqgvcLJhuJt5eSWseuTfHJ9oVsG9TpPvytyfzF56mg@mail.gmail.com> <alpine.DEB.2.20.1707111221040.845@digraph.polyomino.org.uk> <CALoOobMw+FNimtKqhO4cyayp8HGeGWO55veqKCD=fNoknreXkA@mail.gmail.com> <alpine.DEB.2.20.1707111900580.9395@digraph.polyomino.org.uk> <CALoOobOgwZx2K-FWFi5zmqjk4vfxB4B2+Bv6koou7Ks9WCT4Yw@mail.gmail.com> <mvmwp7d7i29.fsf@suse.de> <CALoOobMnZ=dE52uWT-PXSb+Sz8SRbjG7FKCmuG+kZ47db76PUg@mail.gmail.com> <1500467179.27895.55.camel@redhat.com> <CALoOobPx5KHn7rGQKTy3UnEq_YnzXnGKrKCLmJHPd5bkipb79g@mail.gmail.com> <1500926864.30433.42.camel@redhat.com> <CALoOobO1sCK2jMJ69_r-m47PshpM3OnReWkAp6QA=vr6OycrJw@mail.gmail.com> <CALoOobPuWkBOjqMnb9OWf1-3G4=vXqL_Z0LTs7HWACn=Ew+mng@mail.gmail.com> <abd5aba0-4ffe-4794-f390-d57b1522b937@redhat.com> <CALoOobP4KGou=4OL7hVEDOTPesUyNnZE4CoaRhJCCU=RBGPyQQ@mail.gmail.com> <d4b7473f-92eb-db93-7b74-edc4805acffd@redhat.com> <CALoOobM=EkhzY9ye0bsGfFqqfz2kHGV10oJpER3ey_bMCNCj1A@mail.gmail.com>

On 09/18/2017 05:03 PM, Paul Pluzhnikov wrote:
> On Mon, Sep 18, 2017 at 2:15 PM, Carlos O'Donell <carlos@redhat.com> wrote:
> 
>>> -           uint64_t check = __new_exitfn_called;
>>> +         /* We don't want to run this cleanup more than once.  */
>>
>> We have just changed the way locking works, and the above comment
>> worries me, particularly for test coverage.
>>
>> Under what conditions can this function be called more than once?
> 
> Presumably the applicaton itself may call __cxa_finalize(NULL) from
> multiple threads.
> 
>> Could we amend the comment here to be more descriptive then?
> 
> Sure.

I'll make my concrete suggestion below.

>> Is it a bug that the thread calling dlclose may be the only thread
>> running this particular function while the other thread is running
>> to exit?
>>
>> T1-> dlclose
>> T1-> library destructors call __cxa_finalize
>> T1-> picks function foo off the list, marks flavor ef_free
>> T1-> unlocks list, starts executing foo.
>> T2-> exit
>> T2-> starts executing all destructors, skips foo marked ef_free
>> T2-> proceeds to terminate the process
>>
>> Is T1's call to foo incomplete?
> 
> Yes. But an application that calls exit in parallel with running
> threads always has the risk that any of its functions  will
> "evaporate" mid-sentence.

Yes, and no.

Users expect exit() to run *all* of their registered exit functions.

When dlclose() and exit() interleave, you have the potential for
one function which is currently being run by dlclose() to be unable
to finish, and that's not expected. In fact that function runs
partly in parallel with the next registered function, and that could
be seen as a violation of POSIX requirements that functions run in
LIFO order.

I just had to test this out, so I wrote the following program:

#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
#include <semaphore.h>
#include <pthread.h>

sem_t order;

void *
open_library (char * pathname)
{
  void *dso;
  char *err;
  /* Open the DSO.  */
  dso = dlopen (pathname, RTLD_NOW|RTLD_GLOBAL);
  if (dso == NULL)
    {
      err = dlerror ();
      fprintf (stderr, "%s\n", err);
      exit (1);
    }
  /* Clear any errors.  */
  dlerror ();
  return dso;
}

int
close_library (void *dso)
{
  int ret;
  char *err;
  /* Close the library and look for errors too.  */
  ret = dlclose (dso);
  if (ret != 0)
    {
      err = dlerror ();
      fprintf (stderr, "%s\n", err);
      exit (1);
    }
  return ret;
}

void *
exit_thread (void *arg)
{
  /* Wait for the dlclose to start...  */
  sem_wait (&order);
  /* Then try to run the exit sequence which should call all
     __cxa_atexit registered functions and in parallel with
     the executing dlclose().  */
  exit (0);
}

int
main (void)
{
  void *dso;
  pthread_t thread;
  dso = open_library ("./libhas-dtors.so");
  pthread_create (&thread, NULL, exit_thread, NULL); 
  close_library (dso);
  pthread_join (thread, NULL);
  return 1;
}

#include <stdio.h>
#include <stdlib.h>
#include <semaphore.h>
#include <unistd.h>

/* Semaphore defined in executable to ensure we have
   a happens-before between the first function starting
   and exit being called.  */
extern sem_t order;

/* glibc function for registering DSO-specific exit functions.  */
extern int __cxa_atexit (void (*func) (void *), void *arg, void *dso_handle);

/* Hidden compiler handle to this shared object.  */
extern void *__dso_handle __attribute__ ((__weak__));

static void
first (void *start)
{
  sem_post (&order);
  sleep (10);
  printf ("first\n");
}

static void
second (void *start)
{
  printf ("second\n");
}

__attribute__ ((constructor)) static void
constructor (void)
{
  sem_init (&order, 0, 0);
  __cxa_atexit (second, NULL, __dso_handle);
  __cxa_atexit (first, NULL, __dso_handle);
}

gcc -O0 -g3 -shared -fPIC -o libhas-dtors.so has-dtors.c -lpthread
gcc -O0 -g3 -export-dynamic -o tst-dlclose-exit tst-dlclose-exit.c -lpthread -ldl

$ ./tst-dlclose-exit 
second
first
second

Which is just wrong, so it shows the existing implementation is broken.

The thread runs second very quickly, then first runs, then second runs
again.

It doesn't exit right away... and the trick is this:

* exit() must call _dl_fini, which needs the loader lock.
* dlclose() already holds the loader lock.
* therefore exit() blocks on the completion of dlclose().

So we are actually safe to finish running our existing handlers, but as
you see it runs second twice.

Can you run the above test case with your patch?

> Also, AFAICT this patch does not change the behavior here: the exact
> same incomplete call to foo can happen with current code.

OK, concrete suggestion per my previous email:

/* We don't want to run this cleanup more than once. The Itanium
   C++ ABI requires that multiple calls to __cxa_finalize not
   result in calling termination functions more than once. One
   potential scenario where that could happen is with a concurrent
   dlclose and exit, where the running dlclose must at some point
   release the list lock, an exiting thread may acquire it, and
   without setting flavor to ef_free, might re-run this destructor
   which could result in undefined behaviour.  Therefore we must
   set flavor to ef_free to avoid calling this destructor again.
   Note that the concurrent exit must also take the dynamic loader
   lock (for library finalizer processing) and therefore will 
   block while dlclose completes the processing of any in-progress
   exit functions. Lastly, once we release the list lock for the 
   entry marked ef_free, we must not read from that entry again 
   since it may have been reused by the time we take the list lock
   again. Lastly the detection of new registered exit functions is 
   based on a monotonically incrementing counter, and there is an 
   ABA if between the unlock to run the exit function and the 
   re-lock after completion the user registers 2^64 exit functions,
   the implementation will not detect this and continue without
   executing any more functions.

   One minor issue remains: A registered exit function that is in
   progress by a call to dlclose() may not completely finish before
   the next registered exit function is run. This may, according to
   some readings of POSIX violate the requirement that functions
   run in effective LIFO order. This should probably be fixed in a
   future implementation to ensure the functions do not run in
   parallel.  */

Before I sign off on this I'd like to see the results of your patch
running the example I provided above.

I expect it to print:
second
first

Thanks for wading through these issues.

-- 
Cheers,
Carlos.

Follow-Ups:
- Re: [patch] Fix for bz14333 -- race between atexit() and exit()
  - From: Paul Pluzhnikov

References:
- Re: [patch] Fix for bz14333 -- race between atexit() and exit()
  - From: Carlos O'Donell
- Re: [patch] Fix for bz14333 -- race between atexit() and exit()
  - From: Paul Pluzhnikov
- Re: [patch] Fix for bz14333 -- race between atexit() and exit()
  - From: Carlos O'Donell
- Re: [patch] Fix for bz14333 -- race between atexit() and exit()
  - From: Paul Pluzhnikov

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]