This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [patch] Fix for bz14333 -- race between atexit() and exit()
- From: Carlos O'Donell <carlos at redhat dot com>
- To: Paul Pluzhnikov <ppluzhnikov at google dot com>, Torvald Riegel <triegel at redhat dot com>
- Cc: Andreas Schwab <schwab at suse dot de>, Joseph Myers <joseph at codesourcery dot com>, GLIBC Devel <libc-alpha at sourceware dot org>, Ricky Zhou <rickyz at google dot com>
- Date: Tue, 19 Sep 2017 15:32:21 -0600
- Subject: Re: [patch] Fix for bz14333 -- race between atexit() and exit()
- Authentication-results: sourceware.org; auth=none
- References: <CALoOobNCZqgvcLJhuJt5eSWseuTfHJ9oVsG9TpPvytyfzF56mg@mail.gmail.com> <alpine.DEB.2.20.1707111221040.845@digraph.polyomino.org.uk> <CALoOobMw+FNimtKqhO4cyayp8HGeGWO55veqKCD=fNoknreXkA@mail.gmail.com> <alpine.DEB.2.20.1707111900580.9395@digraph.polyomino.org.uk> <CALoOobOgwZx2K-FWFi5zmqjk4vfxB4B2+Bv6koou7Ks9WCT4Yw@mail.gmail.com> <mvmwp7d7i29.fsf@suse.de> <CALoOobMnZ=dE52uWT-PXSb+Sz8SRbjG7FKCmuG+kZ47db76PUg@mail.gmail.com> <1500467179.27895.55.camel@redhat.com> <CALoOobPx5KHn7rGQKTy3UnEq_YnzXnGKrKCLmJHPd5bkipb79g@mail.gmail.com> <1500926864.30433.42.camel@redhat.com> <CALoOobO1sCK2jMJ69_r-m47PshpM3OnReWkAp6QA=vr6OycrJw@mail.gmail.com> <CALoOobPuWkBOjqMnb9OWf1-3G4=vXqL_Z0LTs7HWACn=Ew+mng@mail.gmail.com> <abd5aba0-4ffe-4794-f390-d57b1522b937@redhat.com> <CALoOobP4KGou=4OL7hVEDOTPesUyNnZE4CoaRhJCCU=RBGPyQQ@mail.gmail.com> <d4b7473f-92eb-db93-7b74-edc4805acffd@redhat.com> <CALoOobM=EkhzY9ye0bsGfFqqfz2kHGV10oJpER3ey_bMCNCj1A@mail.gmail.com>
On 09/18/2017 05:03 PM, Paul Pluzhnikov wrote:
> On Mon, Sep 18, 2017 at 2:15 PM, Carlos O'Donell <carlos@redhat.com> wrote:
>
>>> - uint64_t check = __new_exitfn_called;
>>> + /* We don't want to run this cleanup more than once. */
>>
>> We have just changed the way locking works, and the above comment
>> worries me, particularly for test coverage.
>>
>> Under what conditions can this function be called more than once?
>
> Presumably the applicaton itself may call __cxa_finalize(NULL) from
> multiple threads.
>
>> Could we amend the comment here to be more descriptive then?
>
> Sure.
I'll make my concrete suggestion below.
>> Is it a bug that the thread calling dlclose may be the only thread
>> running this particular function while the other thread is running
>> to exit?
>>
>> T1-> dlclose
>> T1-> library destructors call __cxa_finalize
>> T1-> picks function foo off the list, marks flavor ef_free
>> T1-> unlocks list, starts executing foo.
>> T2-> exit
>> T2-> starts executing all destructors, skips foo marked ef_free
>> T2-> proceeds to terminate the process
>>
>> Is T1's call to foo incomplete?
>
> Yes. But an application that calls exit in parallel with running
> threads always has the risk that any of its functions will
> "evaporate" mid-sentence.
Yes, and no.
Users expect exit() to run *all* of their registered exit functions.
When dlclose() and exit() interleave, you have the potential for
one function which is currently being run by dlclose() to be unable
to finish, and that's not expected. In fact that function runs
partly in parallel with the next registered function, and that could
be seen as a violation of POSIX requirements that functions run in
LIFO order.
I just had to test this out, so I wrote the following program:
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
#include <semaphore.h>
#include <pthread.h>
sem_t order;
void *
open_library (char * pathname)
{
void *dso;
char *err;
/* Open the DSO. */
dso = dlopen (pathname, RTLD_NOW|RTLD_GLOBAL);
if (dso == NULL)
{
err = dlerror ();
fprintf (stderr, "%s\n", err);
exit (1);
}
/* Clear any errors. */
dlerror ();
return dso;
}
int
close_library (void *dso)
{
int ret;
char *err;
/* Close the library and look for errors too. */
ret = dlclose (dso);
if (ret != 0)
{
err = dlerror ();
fprintf (stderr, "%s\n", err);
exit (1);
}
return ret;
}
void *
exit_thread (void *arg)
{
/* Wait for the dlclose to start... */
sem_wait (&order);
/* Then try to run the exit sequence which should call all
__cxa_atexit registered functions and in parallel with
the executing dlclose(). */
exit (0);
}
int
main (void)
{
void *dso;
pthread_t thread;
dso = open_library ("./libhas-dtors.so");
pthread_create (&thread, NULL, exit_thread, NULL);
close_library (dso);
pthread_join (thread, NULL);
return 1;
}
#include <stdio.h>
#include <stdlib.h>
#include <semaphore.h>
#include <unistd.h>
/* Semaphore defined in executable to ensure we have
a happens-before between the first function starting
and exit being called. */
extern sem_t order;
/* glibc function for registering DSO-specific exit functions. */
extern int __cxa_atexit (void (*func) (void *), void *arg, void *dso_handle);
/* Hidden compiler handle to this shared object. */
extern void *__dso_handle __attribute__ ((__weak__));
static void
first (void *start)
{
sem_post (&order);
sleep (10);
printf ("first\n");
}
static void
second (void *start)
{
printf ("second\n");
}
__attribute__ ((constructor)) static void
constructor (void)
{
sem_init (&order, 0, 0);
__cxa_atexit (second, NULL, __dso_handle);
__cxa_atexit (first, NULL, __dso_handle);
}
gcc -O0 -g3 -shared -fPIC -o libhas-dtors.so has-dtors.c -lpthread
gcc -O0 -g3 -export-dynamic -o tst-dlclose-exit tst-dlclose-exit.c -lpthread -ldl
$ ./tst-dlclose-exit
second
first
second
Which is just wrong, so it shows the existing implementation is broken.
The thread runs second very quickly, then first runs, then second runs
again.
It doesn't exit right away... and the trick is this:
* exit() must call _dl_fini, which needs the loader lock.
* dlclose() already holds the loader lock.
* therefore exit() blocks on the completion of dlclose().
So we are actually safe to finish running our existing handlers, but as
you see it runs second twice.
Can you run the above test case with your patch?
> Also, AFAICT this patch does not change the behavior here: the exact
> same incomplete call to foo can happen with current code.
OK, concrete suggestion per my previous email:
/* We don't want to run this cleanup more than once. The Itanium
C++ ABI requires that multiple calls to __cxa_finalize not
result in calling termination functions more than once. One
potential scenario where that could happen is with a concurrent
dlclose and exit, where the running dlclose must at some point
release the list lock, an exiting thread may acquire it, and
without setting flavor to ef_free, might re-run this destructor
which could result in undefined behaviour. Therefore we must
set flavor to ef_free to avoid calling this destructor again.
Note that the concurrent exit must also take the dynamic loader
lock (for library finalizer processing) and therefore will
block while dlclose completes the processing of any in-progress
exit functions. Lastly, once we release the list lock for the
entry marked ef_free, we must not read from that entry again
since it may have been reused by the time we take the list lock
again. Lastly the detection of new registered exit functions is
based on a monotonically incrementing counter, and there is an
ABA if between the unlock to run the exit function and the
re-lock after completion the user registers 2^64 exit functions,
the implementation will not detect this and continue without
executing any more functions.
One minor issue remains: A registered exit function that is in
progress by a call to dlclose() may not completely finish before
the next registered exit function is run. This may, according to
some readings of POSIX violate the requirement that functions
run in effective LIFO order. This should probably be fixed in a
future implementation to ensure the functions do not run in
parallel. */
Before I sign off on this I'd like to see the results of your patch
running the example I provided above.
I expect it to print:
second
first
Thanks for wading through these issues.
--
Cheers,
Carlos.