This is the mail archive of the libc-alpha@sourceware.cygnus.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

["E. Jay Berkenbilt" <ejb@apexinc.com>] libc/1298: malloc()/fork() deadlock with linuxthreads



Hi,

here's an unsolved bug report.  Wolfram Gloger made the following
comment (translated from German and abbreviated):

> The proposed fix to block signals in malloc hurds performance too
> much (two system calls for every malloc () -- argh!).
> But I've found the following in the SUS-fork spec:
>     A process is created with a single thread. If a multi-threaded
>     process calls fork(), the new process contains a replica of the
>     calling thread and its entire address space, possibly including the
>     states of mutexes and other resources. Consequently, to avoid
>     errors, the child process may only execute async-signal safe
>     operations until such time as one of the exec functions is
>     called.
>
> The question remains if `the child process may only execute
> async-signal safe operations' strictly applies only to
> `multi-threaded' processes.  IMHO each process that's linked against
> libpthread --- especially if it calls fork () in a signal handler
> --- has to be treated as such.
> 
> Therefore we could check in the pthread_atfork() `prepare' handler
> for malloc if fork has been called from a signal handler - and in
> this case to not call ptmalloc_lock_all ().  In the child process
> (and also in the parent until the signal returns) the heap would be
> corrupt - but `may only execute async-signal safe operations until
> such time as one of the exec functions is called' would apply here.

> Any comments?

Sorry, Wolfram if I mistranslated or misinterpreted your statements.
I'm repeating his question for:

Any comments?

Andreas



Topics:
   libc/1298: malloc()/fork() deadlock with linuxthreads


----------------------------------------------------------------------

Date: Tue, 14 Sep 1999 12:19:03 -0400
From: "E. Jay Berkenbilt" <ejb@apexinc.com>
To: bugs@gnu.org
Cc: ejb@ql.org
Subject: libc/1298: malloc()/fork() deadlock with linuxthreads
Message-Id: <199909141619.MAA16239@soup.ads.apexinc.com>


>Number:         1298
>Category:       libc
>Synopsis:       deadlock occurs when fork() called while malloc() is active in same thread
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    libc-gnats
>State:          open
>Class:          sw-bug
>Submitter-Id:   unknown
>Arrival-Date:   Tue Sep 14 12:20:01 EDT 1999
>Last-Modified:
>Originator:     
>Organization:
 
>Release:        libc-2.1.1
>Environment:
	RedHat Linux 6.0 (glibc-990416), pthreads
	Based on code, bug is probably still in current glibc, but I
	haven't verified this.
Host type: i386-redhat-linux-gnu
System: Linux soup.ads.apexinc.com 2.2.10 #1 Fri Jul 9 10:30:34 EDT 1999 i686 unknown
Architecture: i686

Addons: crypt glibc-compat linuxthreads
Build CFLAGS: -g -O3
Build CC: egcs
Compiler version: egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)
Kernel headers: 2.2.10
Symbol versioning: yes
Build static: yes
Build shared: yes
Build pic-default: no
Build profile: yes
Build omitfp: no
Build bounded: no
Build static-nss: no
Stdio: libio

>Description:

	NOTE: This process is multithreaded and uses pthreads.  It
	does not use linuxthreads directly.  I don't know whether this
	problem would occur in a single-threaded program, but I
	suspect it would based on looking at the glibc code.

	If fork() is called while malloc() is active, a deadlock
	appears to occur.  This can happen if a signal handler is
	invoked while malloc() is active and that signal handler calls
	fork().

	The POSIX threads specification dictates that fork() be
	asynchronous signal-safe.  In my code, I have a segmentation
	fault handler that essentially says

	if (fork() == 0)
	{
	    abort();
	}
	exit(2);

	so that I can get a core dump.  (Otherwise, the process
	doesn't dump core because it shares memory with other
	processes, being multithreaded, and given how Linux threads
	are processes.)

>How-To-Repeat:

	Sadly, I cannot reliably reproduce this because the timing is
	too tricky, though if I succeed, I'll send an update.
	However, I do have a good stacktrace.  Sorry about this stack
	trace having been generated by mulitple calls to "up" rather
	than to where.  I managed to attach the thread while it was
	deadlocked and to capture this from the screen.

.../sysdeps/unix/sysv/linux/sigsuspend.c:48: No such file or directory.
Current language:  auto; currently c
(gdb) up
#1  0x40021f51 in __pthread_lock (lock=0x4019f9a0, self=0xbf7ffe7c)
    at restart.h:32
restart.h:32: No such file or directory.
(gdb) 
#2  0x4001f83a in __pthread_mutex_lock (mutex=0x4019f990) at mutex.c:84
mutex.c:84: No such file or directory.
(gdb) 
#3  0x4010e486 in ptmalloc_lock_all () at malloc.c:1565
malloc.c:1565: No such file or directory.
(gdb) 
#4  0x4001fbac in fork () at ptfork.c:73
ptfork.c:73: No such file or directory.
(gdb) 
#5  0x804ce32 in sig_handler (s=11) at databus.cc:65
65          if (fork() == 0)
Current language:  auto; currently c++
(gdb) 
#6  <signal handler called>
(gdb) 
#7  0x4001e269 in pthread_cond_timedwait (cond=0x110, mutex=0x100, 
    abstime=0xffffffff) at condvar.c:94
condvar.c:94: No such file or directory.
Current language:  auto; currently c
(gdb) 
#8  0x4010eb8a in __libc_malloc (bytes=272) at malloc.c:2616
malloc.c:2616: No such file or directory.
(gdb) 
#9  0x8092568 in __malloc_alloc_template<0>::allocate (n=272)
    at /usr/include/g++-2/stl_alloc.h:151
151         void *result = malloc(n);
Current language:  auto; currently c++
(gdb) print result
$1 = (void *) 0xbf7ffad8
(gdb) up
#10 0x809292a in __default_alloc_template<true, 0>::allocate (n=272)
    at /usr/include/g++-2/stl_alloc.h:396
396             return(malloc_alloc::allocate(n));
(gdb) 
#11 0x8092a14 in __nw__Q2t12basic_string3ZcZt18string_char_traits1ZcZt24__default_alloc_template2b1i0_3RepUiUi (s=16, extra=256)
    at /usr/include/g++-2/std/bastring.cc:33
33        return Allocator::allocate(s + extra * sizeof (charT));
(gdb) 
#12 0x8092a46 in basic_string<char, string_char_traits<char>, __default_alloc_template<true, 0> >::Rep::create (extra=135)
    at /usr/include/g++-2/std/bastring.cc:60
60        Rep *p = new (extra) Rep;
(gdb) 
#13 0x8092d23 in basic_string<char, string_char_traits<char>, __default_alloc_template<true, 0> >::replace (this=0xbf7ffbc0, pos=7, n1=0, 
    s=0x813afd0 "aaftp: sent: c14537 DATA /home/xfr/cas-images/593-256b/.in_progress/01974681F.0006.tiff 0 10000 eecbeb22d7e41e830e2b9510700489b7b9cd31aa41a6bc20a0e1a9213", n2=128) at /usr/include/g++-2/std/bastring.cc:164
164           Rep *p = Rep::create (newlen);
(gdb) 
#14 0x8092f32 in basic_string<char, string_char_traits<char>, __default_alloc_template<true, 0> >::replace (this=0xbf7ffbc0, pos1=7, n1=0, str=@0x8140c94, 
    pos2=0, n2=128) at /usr/include/g++-2/std/bastring.cc:131
131       return replace (pos1, n1, str.data () + pos2, n2);
(gdb) 
#15 0x8093388 in basic_string<char, string_char_traits<char>, __default_alloc_template<true, 0> >::append (this=0xbf7ffbc0, str=@0x8140c94, pos=0, 
    n=4294967295) at /usr/include/g++-2/std/bastring.h:162
162         { return replace (length (), 0, str, pos, n); }
(gdb) 
#16 0x80932fd in __pl__H3ZcZt18string_char_traits1ZcZt24__default_alloc_template2b1i0_RCt12basic_string3ZX01ZX11ZX21T0_t12basic_string3ZX01ZX11ZX21 (
    lhs=@0xbf7ffbe0, rhs=@0x8140c94) at /usr/include/g++-2/std/bastring..h:436
436       str.append (rhs);
(gdb) 
#17 0x80688f0 in Msg_Information::unparse (this=0x8140c88)
    at Msg_Information.cc:46
46          return prefix + this->data;
(gdb) print this->data
$2 = {static npos = 4294967295, static nilRep = {len = 0, res = 0, ref = 20, 
    selfish = false}, 
  dat = 0x813afd0 "aaftp: sent: c14537 DATA /home/xfr/cas-images/593-256b/.in_progress/01974681F.0006.tiff 0 10000 eecbeb22d7e41e830e2b9510700489b7b9cd31aa41a6bc20a0e1a9213"}
(gdb) print prefix
$3 = {static npos = 4294967295, static nilRep = {len = 0, res = 0, ref = 20, 
    selfish = false}, 
  dat = 0x829c0a0 "DEBUG: dlwess"}
(gdb) up
#18 0x8050bb1 in Logger::log (this=0x8131260, message=0x8140c88)
    at Logger.cc:149
149         string text = message->unparse();
(gdb) 
#19 0x80506ba in Logger::main (this=0x8131260) at Logger.cc:67
67                  log(message);
(gdb) 
#20 0x804d8f7 in DBThread::apply (this=0x8131260) at DBThread.cc:43
43              this->main();
(gdb) 
#21 0x8085906 in QThread::apply_closure (arg=0x8131260) at QThread.cc:142
142         closure->apply();
(gdb) detach
Detaching from program: /u1/q/devel/system/databus/src/databus/ix86.linux.libc6/databus Thread 8827

>Fix:
	The only fix I can think of us to block signals inside of the
	relevant critical sections in malloc().  Although this handles
	the specific case of fork() being called from a signal
	handler and not the general case of fork() being called from
	malloc(), I can't think of any way that fork() could called
	while malloc() is active other than via a signal handler.  All
	the malloc/fork interaction code in glibc seems to be right
	for the case of malloc being called in one thread and fork in
	another....
>Audit-Trail:
>Unformatted:


------------------------------

End of forward73GyNd Digest
***************************



-- 
 Andreas Jaeger   
  SuSE Labs aj@suse.de	
   private aj@arthur.rhein-neckar.de

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]