This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug malloc/16742] race condition: pthread_atfork() called before first malloc() results in unexpected locking behaviour/deadlocks


https://sourceware.org/bugzilla/show_bug.cgi?id=16742

--- Comment #2 from prumpf at gmail dot com ---
I've attached a patch that would fix this issue (and a closely-related one
described at https://sourceware.org/ml/libc-alpha/2013-01/msg01051.html), but
which I'm not entirely satisfied is the right thing to do yet (it also contains
a debugging XXX). Still, it might serve as a starting point for fixing this.
The draft message I'd prepared for it follows:

------------

This is a bug report and proposed patch for the git version of glibc; it is
specific to NPTL.

There appear to be at least two deadlock problems caused by the order in which
atfork handlers, in particular ptmalloc_lock_all, are run.

https://sourceware.org/bugzilla/show_bug.cgi?id=16742 means ptmalloc_lock_all
must be run after other atfork handlers, even if registered after them.
https://sourceware.org/ml/libc-alpha/2013-01/msg01051.html is a bit trickier:
it demonstrates that lock ordering should be "lock IO list before taking malloc
locks", which means ptmalloc_lock_all has to be run with the IO list lock
already held. Other solutions would work as well, but I've focussed on that one
since it puts the least constraints on libio. This bug can be reproduced by
running a test program with the right breakpoints set in gdb.
I've prepared a patch that I believe fixes the problem, but I'm not overly
attached to doing it this way.

This patch fixes both deadlocks and introduces the concept of an atfork
handler's priorityâa high-priority handler is run inside of a low-priority one;
it creates a new atfork handler for the IO lock, and gives it priority
intermediate between low-priority user handlers and the high-priority malloc
atfork handler; it uses a high priority for malloc's atfork handler; it adjusts
register-atfork.c and unregister-atfork.c so they keep the __fork_handlers list
sorted by priority.

The complication introduced by this patch is unfortunate, but I believe it is
the best approach. However, an alternative would be: don't introduce a priority
level; put _IO_list_lock into an atfork handler that's installed first; make
thread_atfork install malloc's atfork handler at the end of the list, no matter
what. (I believe that approach would leave us with very fragile code and
without the ability to put more of our MT-specific fork code into atfork
handlers.)
I believe there are no new restrictions on libio code, but there is a (new?)
restriction on malloc: code must not hold malloc locks while waiting for I/O to
finish, unless it gets the I/O lock first. In particular, this applies to
diagnostic output and malloc.c's assert(), which might deadlock rather than
printing a message.
In addition to the alternative mentioned above, I've also considered another
approach: pretend to take both the I/O and the malloc lock or neither: first
lock A, then try to lock B; in the failure case, unlock A, lock B, then try to
lock A, and so on. However, that's complicated, doesn't extend well to the case
of several high-priority atfork handlers, and might result in a livelock.
I do not suggest making the priority level accessible from user space: it's
easy enough for glibc users to implement their own priority system or call
pthread_atfork() in the right order.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]