This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFC: Deadlock in multithreaded application when doing IO and fork.


Community,

I've seen what I believe to be the following deadlock
scenario in a multithreaded application when doing 
IO and forking. 

It is safe to call fork in a multthreaded environment. 
It is also safe to do IO in a multithreaded environment.
Doing both at the same time is supposed to be safe, 
but is known to be dangerous if you don't know what 
you're doing.

The following deadlock scenario looks like a bug in glibc.

Thread A:

* Calls fork() which runs pthread_atfork handlers.
* Malloc's registered handler runs and locks out all other
  threads from doing allocations by causing them to block.
* Tries to take the IO list lock via _IO_list_lock() 
  -> _IO_lock_lock(list_all_lock);

Thread B:

* Calls _IO_flush_all() 
  -> _IO_flush_all_lockp (do_lock=1) 
  -> _IO_lock_lock (list_all_lock); ...
* Walks the list of open fp's and tries to take each
  fp's lock and flush.

Thread C:

* Calls _IO_getdelim which takes only the fp lock 
  and then tries to call either malloc or realloc.
* Tries to take arena lock to allocate memory.

So in summary:
Thread A waits on thread B for the list_all_lock lock.
Thread B waits on thread C for the fp file lock.
Thread C waits on thread A for the arena lock.

The window for this to happen is small.

You need at least three threads.

One possible fix looks like this:

Thread A:

* Calls fork() which runs pthread_atfork handlers.
* Run malloc's registered handler *last*
* Malloc's handler does:
  - Call _IO_list_lock() first.
  - Lock all arenas last.
* Continue with fork processing...

Thread B:
...

Thread C:
...

The salient point is that the last thing we do
is lock list_all_lock and then lock the arenas.

In this case A and B each try to acquire list_all_lock
before locking the arenas, and that ensures that other
threads are able to make forward progress or are 
blocked, but not deadlocked.

The wrinkle here is that once you take list_all_lock
another thread could be inside malloc and trigger an 
assertion or debug output, which will block, and 
deadlock fork() e.g. 

T1			T2
fork
take list_all_lock
			calls malloc
			takes arena lock.
			malloc aborts and tries to do IO.
				or
			user define malloc tries to do IO.
			blocks on list_all_lock.
blocks on arena lock

This seems like a better scenario than before. Now we only
deadlock in a failure scenario during a smaller window.

We could detect this in malloc by doing a trylock on the
list_all_lock and avoid the IO, continuing on to calling
abort().

In abort() we will flush all the IO *without* taking locks
(calls _IO_flush_all_lockp(do_lock=0)) since abort()
might be called from anywhere.

User malloc handlers will have to setup pthread_atfork
handlers to notify themselves of the upcoming fork and
that they should stop doing IO or risk inconsistent
state in the child and deadlock in the parent.

I don't see any serious performance arguments against
this fix, we are simply changing the order of the lock
acquisition. I might actually argue that by moving the
arena locking last, we actually allow other threads to
make progress while we run our handlers.

There are some arguments that can be made for locking
arenas early in the process and how that impacts performance
for the forking thread (but deteriorates it for all other
threads).

Comments?

Cheers,
Carlos.




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]