This is the mail archive of the libc-alpha@sourceware.cygnus.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]

Help debugging problem with system() or vfork() and pthreads on Linux?


(Andreas:  You may have heard from me previously w.r.t. PR# libc/1320...that
issue is still present in the latest libc--sorry it took so long to get back
to you, but I've been struggling with what you see below, and moving our
company from one location to another.  Sorry for the delay.  I figured you
might want to see this, anyhow...  --George)



I've got a C++ program that creates a thread which delivers asynchronous
alarms in the background.  The program is a program which tests a subsystem of
a larger system.  Near the end of the program, and before the asynchronous
alarm thread finishes (it's waiting in pthread_cond_wait()), the program
calls:

	system((string("rm -rf ") + temp_dir_name).c_str());

...which deletes the temporary directory.  This hangs.  It hangs both at the
command line, and in the debugger.  If I examine the program in the debugger:

Program received signal SIGINT, Interrupt.
0x4032bb38 in __vfork () from /lib/libc.so.6
(gdb) info threads
  3 Thread 25803  pthread_handle_sigrestart (sig=-1082131356, ctx={gs = 0,
__gsh = 0, fs = 64612,
      __fsh = 49023, es = 57053, __esh = 16442, ds = 39336, __dsh = 16443, edi
= 0, esi = 1075199732,
      ebp = 6, esp = 3212835956, ebx = 1073887036, edx = 0, ecx = 0, eax =
3212836104,
      trapno = 1073922161, err = 1077572636, eip = 1075443784, cs = 46600,
__csh = 16409,
      eflags = 1073782878, esp_at_signal = 1073886812, ss = 19256, __ssh =
16385, fpstate = 0xbf7ffcdc,
      oldmask = 1077204254, cr2 = 1075426824}) at pthread.c:625
* 2 Thread 25792 (initial thread)  0x4032bb38 in __vfork () from
/lib/libc.so.6
Current language:  auto; currently c
(gdb)




Note that the pthreads internal "(manager thread)" is missing from the list...




(gdb) thread 2
[Switching to thread 2 (Thread 25792 (initial thread))]
#0  0x4032bb38 in __vfork () from /lib/libc.so.6
(gdb) bt
#0  0x4032bb38 in __vfork () from /lib/libc.so.6
#1  0x403b99a8 in ?? () from /lib/libpthread.so.0
#2  0x403afd4e in system (line=0x806f238 "rm -rf /tmp/filexSnUv7") at
wrapsyscall.c:120
#3  0x804e697 in main (argc=1, argv=0xbffff744) at reviewtest.cpp:334






The program appears to be waiting for the system() call to finish...





(gdb) thread 3
[Switching to thread 3 (Thread 25803)]
#0  pthread_handle_sigrestart (sig=-1082131356, ctx={gs = 0, __gsh = 0, fs =
64612, __fsh = 49023,
      es = 57053, __esh = 16442, ds = 39336, __dsh = 16443, edi = 0, esi =
1075199732, ebp = 6,
      esp = 3212835956, ebx = 1073887036, edx = 0, ecx = 0, eax = 3212836104,
trapno = 1073922161,
      err = 1077572636, eip = 1075443784, cs = 46600, __csh = 16409, eflags =
1073782878,
      esp_at_signal = 1073886812, ss = 19256, __ssh = 16385, fpstate =
0xbf7ffcdc,
      oldmask = 1077204254, cr2 = 1075426824}) at pthread.c:625
625       asm volatile ("movw %w0,%%gs" : : "r" (ctx.gs));
(gdb) bt
#0  pthread_handle_sigrestart (sig=-1082131356, ctx={gs = 0, __gsh = 0, fs =
64612, __fsh = 49023,
      es = 57053, __esh = 16442, ds = 39336, __dsh = 16443, edi = 0, esi =
1075199732, ebp = 6,
      esp = 3212835956, ebx = 1073887036, edx = 0, ecx = 0, eax = 3212836104,
trapno = 1073922161,
      err = 1077572636, eip = 1075443784, cs = 46600, __csh = 16409, eflags =
1073782878,
      esp_at_signal = 1073886812, ss = 19256, __ssh = 16385, fpstate =
0xbf7ffcdc,
      oldmask = 1077204254, cr2 = 1075426824}) at pthread.c:625
#1  0x403adf20 in __pthread_wait_for_restart_signal (self=0xbf7ffe40) at
pthread.c:779
#2  0x403aa9d0 in pthread_cond_wait (cond=0x40163ef4, mutex=0x40163edc) at
restart.h:26
#3  0x400ff80e in CAlarm::AlarmThread (arg=0x0) at
../src/threads/Alarm_Linux.cpp:177
#4  0x403abc72 in pthread_start_thread (arg=0xbf7ffe40) at manager.c:241
(gdb)




...And the alarm thread is waiting for a new alarm to be scheduled.  The
directory has not been deleted, and there is an additional reviewtestdb
process that shows up in the "ps" listing:

25804 pts/0    Z      0:00 [reviewtestdb <defunct>]

I figure that there's a bug in my code, as there's a different test program
which exercises a different part of the system which has alarms running too,
and that program executes to completion.

There is one weird effect with the program that works--if I run the working
program from the command line, it executes to completion, but if I run it in
GDB, then it hangs in similar fashion to the non-working program, right at the
end when the alarm thread fails to quit, but all other threads have quit:

Program received signal SIGINT, Interrupt.
0x400c872e in __sigsuspend (set=0xbf7ffc44) at
../sysdeps/unix/sysv/linux/sigsuspend.c:48
48            int result = INLINE_SYSCALL (rt_sigsuspend, 2, set, _NSIG / 8);
(gdb) info threads
* 3 Thread 25869  0x400c872e in __sigsuspend (set=0xbf7ffc44)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:48



...hmm...no initial or manager threads...a bit suspicious if you ask me...




(gdb) bt
#0  0x400c872e in __sigsuspend (set=0xbf7ffc44) at
../sysdeps/unix/sysv/linux/sigsuspend.c:48
#1  0x40023f20 in __pthread_wait_for_restart_signal (self=0xbf7ffe40) at
pthread.c:779
#2  0x4002566e in __pthread_lock (lock=0x8153f9c, self=0xbf7ffe40) at
restart.h:26
#3  0x4002289b in __pthread_mutex_lock (mutex=0x8153f8c) at mutex.c:84
#4  0x80b4750 in CAlarm::AlarmThread (arg=0x0) at
../src/threads/Alarm_Linux.cpp:219
#5  0x40021c72 in pthread_start_thread (arg=0xbf7ffe40) at manager.c:241
(gdb) quit


Yeah, it's the alarm thread...


The program is running.  Exit anyway? (y or n) y
Error accessing memory address 0x4002bb20: No such process.
(gdb)

Time to hit control-Z again...




Does this sound familiar?

--
George T. Talbot
<george@moberg.com>

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]