This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: RFC: Deadlock in multithreaded application when doing IO and fork.
- From: Roland McGrath <roland at hack dot frob dot com>
- To: "Carlos O'Donell" <carlos at redhat dot com>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Thu, 31 Jan 2013 16:05:24 -0800 (PST)
- Subject: Re: RFC: Deadlock in multithreaded application when doing IO and fork.
- References: <510AF80E.5020400@redhat.com>
I think it certainly makes sense for the malloc atfork handlers always to
run last. Otherwise a user atfork handler can produce deadlock with the
user's own code in other threads in ways that don't really seem to be under
the user's control. For example:
T1 T2
takes user lock L
fork
malloc atfork takes malloc locks
user atfork blocks locking L
calls malloc
blocks on malloc locks
In fact, it's pretty easy to set this up so it always deadlocks, not even
needing a race (e.g. T2 creates T1 after locking L and then uses a
pthread_barrier to wait for T2 to enter its atfork handler before T2 calls
malloc). That seems like a good test case to write, since I think you can
write one like this and pretty easily see that it is POSIX-compliant.
With user atfork handlers that call malloc themselves, it can probably get
even weirder. (Calling malloc in an atfork handler seems like a bad idea
all around, but AFAICT it is kosher under POSIX.)
It's less clear whether that's really sufficient for all kosher scenarios.
If I understood you correctly, the scenario you cited is one that in the
best case would lead to a crash. That is, the user has provoked undefined
behavior. In that case, it's as kosher to deadlock as it is to crash
coherently with malloc assertions, albeit much less useful. So we don't
have a hard mandate to avoid those deadlock cases. Hence it's a tradeoff
of difficulty, maintenance burden, and performance hits vs being extra nice
in helping people diagnose their own bugs. Similarly, setting malloc hooks
is something that really requires knowing about subtleties and internal
implementation issues already and probably always will, so putting the onus
on people who write their own malloc hooks (and especially people who think
that using malloc hooks in a multithreaded program is something they should
be doing) is fine.
I'm not all that clear on the details of the further mitigations you
suggest after the atfork change. I think the right things to do are
(in this order):
1. write the aforementioned test and verify it always deadlocks
2. fix that test by making the malloc atfork handlers always run last
3. commit those
4. reconsider remaining undesireable scenarios in that context
Thanks,
Roland