- Subject: libc/1298: malloc()/fork() deadlock with linuxthreads
- From: "E. Jay Berkenbilt" <ejb at apexinc dot com>
- Date: Wed Nov 10 09:31:33 1999
Topics:
libc/1298: malloc()/fork() deadlock with linuxthreads
----------------------------------------------------------------------
Date: Tue, 14 Sep 1999 12:19:03 -0400
From: "E. Jay Berkenbilt" <ejb@apexinc.com>
To: bugs@gnu.org
Cc: ejb@ql.org
Subject: libc/1298: malloc()/fork() deadlock with linuxthreads
Message-Id: <199909141619.MAA16239@soup.ads.apexinc.com>
>Number: 1298
>Category: libc
>Synopsis: deadlock occurs when fork() called while malloc() is active in same thread
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: libc-gnats
>State: open
>Class: sw-bug
>Submitter-Id: unknown
>Arrival-Date: Tue Sep 14 12:20:01 EDT 1999
>Last-Modified:
>Originator:
>Organization:
>Release: libc-2.1.1
>Environment:
RedHat Linux 6.0 (glibc-990416), pthreads
Based on code, bug is probably still in current glibc, but I
haven't verified this.
Host type: i386-redhat-linux-gnu
System: Linux soup.ads.apexinc.com 2.2.10 #1 Fri Jul 9 10:30:34 EDT 1999 i686 unknown
Architecture: i686
Addons: crypt glibc-compat linuxthreads
Build CFLAGS: -g -O3
Build CC: egcs
Compiler version: egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)
Kernel headers: 2.2.10
Symbol versioning: yes
Build static: yes
Build shared: yes
Build pic-default: no
Build profile: yes
Build omitfp: no
Build bounded: no
Build static-nss: no
Stdio: libio
>Description:
NOTE: This process is multithreaded and uses pthreads. It
does not use linuxthreads directly. I don't know whether this
problem would occur in a single-threaded program, but I
suspect it would based on looking at the glibc code.
If fork() is called while malloc() is active, a deadlock
appears to occur. This can happen if a signal handler is
invoked while malloc() is active and that signal handler calls
fork().
The POSIX threads specification dictates that fork() be
asynchronous signal-safe. In my code, I have a segmentation
fault handler that essentially says
if (fork() == 0)
{
abort();
}
exit(2);
so that I can get a core dump. (Otherwise, the process
doesn't dump core because it shares memory with other
processes, being multithreaded, and given how Linux threads
are processes.)
>How-To-Repeat:
Sadly, I cannot reliably reproduce this because the timing is
too tricky, though if I succeed, I'll send an update.
However, I do have a good stacktrace. Sorry about this stack
trace having been generated by mulitple calls to "up" rather
than to where. I managed to attach the thread while it was
deadlocked and to capture this from the screen.
.../sysdeps/unix/sysv/linux/sigsuspend.c:48: No such file or directory.
Current language: auto; currently c
(gdb) up
#1 0x40021f51 in __pthread_lock (lock=0x4019f9a0, self=0xbf7ffe7c)
at restart.h:32
restart.h:32: No such file or directory.
(gdb)
#2 0x4001f83a in __pthread_mutex_lock (mutex=0x4019f990) at mutex.c:84
mutex.c:84: No such file or directory.
(gdb)
#3 0x4010e486 in ptmalloc_lock_all () at malloc.c:1565
malloc.c:1565: No such file or directory.
(gdb)
#4 0x4001fbac in fork () at ptfork.c:73
ptfork.c:73: No such file or directory.
(gdb)
#5 0x804ce32 in sig_handler (s=11) at databus.cc:65
65 if (fork() == 0)
Current language: auto; currently c++
(gdb)
#6 <signal handler called>
(gdb)
#7 0x4001e269 in pthread_cond_timedwait (cond=0x110, mutex=0x100,
abstime=0xffffffff) at condvar.c:94
condvar.c:94: No such file or directory.
Current language: auto; currently c
(gdb)
#8 0x4010eb8a in __libc_malloc (bytes=272) at malloc.c:2616
malloc.c:2616: No such file or directory.
(gdb)
#9 0x8092568 in __malloc_alloc_template<0>::allocate (n=272)
at /usr/include/g++-2/stl_alloc.h:151
151 void *result = malloc(n);
Current language: auto; currently c++
(gdb) print result
$1 = (void *) 0xbf7ffad8
(gdb) up
#10 0x809292a in __default_alloc_template<true, 0>::allocate (n=272)
at /usr/include/g++-2/stl_alloc.h:396
396 return(malloc_alloc::allocate(n));
(gdb)
#11 0x8092a14 in __nw__Q2t12basic_string3ZcZt18string_char_traits1ZcZt24__default_alloc_template2b1i0_3RepUiUi (s=16, extra=256)
at /usr/include/g++-2/std/bastring.cc:33
33 return Allocator::allocate(s + extra * sizeof (charT));
(gdb)
#12 0x8092a46 in basic_string<char, string_char_traits<char>, __default_alloc_template<true, 0> >::Rep::create (extra=135)
at /usr/include/g++-2/std/bastring.cc:60
60 Rep *p = new (extra) Rep;
(gdb)
#13 0x8092d23 in basic_string<char, string_char_traits<char>, __default_alloc_template<true, 0> >::replace (this=0xbf7ffbc0, pos=7, n1=0,
s=0x813afd0 "aaftp: sent: c14537 DATA /home/xfr/cas-images/593-256b/.in_progress/01974681F.0006.tiff 0 10000 eecbeb22d7e41e830e2b9510700489b7b9cd31aa41a6bc20a0e1a9213", n2=128) at /usr/include/g++-2/std/bastring.cc:164
164 Rep *p = Rep::create (newlen);
(gdb)
#14 0x8092f32 in basic_string<char, string_char_traits<char>, __default_alloc_template<true, 0> >::replace (this=0xbf7ffbc0, pos1=7, n1=0, str=@0x8140c94,
pos2=0, n2=128) at /usr/include/g++-2/std/bastring.cc:131
131 return replace (pos1, n1, str.data () + pos2, n2);
(gdb)
#15 0x8093388 in basic_string<char, string_char_traits<char>, __default_alloc_template<true, 0> >::append (this=0xbf7ffbc0, str=@0x8140c94, pos=0,
n=4294967295) at /usr/include/g++-2/std/bastring.h:162
162 { return replace (length (), 0, str, pos, n); }
(gdb)
#16 0x80932fd in __pl__H3ZcZt18string_char_traits1ZcZt24__default_alloc_template2b1i0_RCt12basic_string3ZX01ZX11ZX21T0_t12basic_string3ZX01ZX11ZX21 (
lhs=@0xbf7ffbe0, rhs=@0x8140c94) at /usr/include/g++-2/std/bastring..h:436
436 str.append (rhs);
(gdb)
#17 0x80688f0 in Msg_Information::unparse (this=0x8140c88)
at Msg_Information.cc:46
46 return prefix + this->data;
(gdb) print this->data
$2 = {static npos = 4294967295, static nilRep = {len = 0, res = 0, ref = 20,
selfish = false},
dat = 0x813afd0 "aaftp: sent: c14537 DATA /home/xfr/cas-images/593-256b/.in_progress/01974681F.0006.tiff 0 10000 eecbeb22d7e41e830e2b9510700489b7b9cd31aa41a6bc20a0e1a9213"}
(gdb) print prefix
$3 = {static npos = 4294967295, static nilRep = {len = 0, res = 0, ref = 20,
selfish = false},
dat = 0x829c0a0 "DEBUG: dlwess"}
(gdb) up
#18 0x8050bb1 in Logger::log (this=0x8131260, message=0x8140c88)
at Logger.cc:149
149 string text = message->unparse();
(gdb)
#19 0x80506ba in Logger::main (this=0x8131260) at Logger.cc:67
67 log(message);
(gdb)
#20 0x804d8f7 in DBThread::apply (this=0x8131260) at DBThread.cc:43
43 this->main();
(gdb)
#21 0x8085906 in QThread::apply_closure (arg=0x8131260) at QThread.cc:142
142 closure->apply();
(gdb) detach
Detaching from program: /u1/q/devel/system/databus/src/databus/ix86.linux.libc6/databus Thread 8827
>Fix:
The only fix I can think of us to block signals inside of the
relevant critical sections in malloc(). Although this handles
the specific case of fork() being called from a signal
handler and not the general case of fork() being called from
malloc(), I can't think of any way that fork() could called
while malloc() is active other than via a signal handler. All
the malloc/fork interaction code in glibc seems to be right
for the case of malloc being called in one thread and fork in
another....
>Audit-Trail:
>Unformatted:
------------------------------
End of forward73GyNd Digest
***************************