This is the mail archive of the glibc-bugs@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug nptl/654] New: Cancelling nptl thread on dlclose() leads to application hangup


Overview Description:
The program loads a module (.so-library) using dlopen(). During this action a
global C++-object is created. No matter how is it created - as a global stack
variable or as a new()-ly created object using __attribute__ ((constructor))
function - in either case the bug is triggered. The constructor of this object
spawns a thread. Then the program unloads the dynamically-loaded module. A
destructor of the mentioned object is called, it calls a function, which tries
to cancel the mentioned spawned thread. The thread is of type
PTHREAD_CANCEL_DEFERRED and periodically checks for its cancelling by
pthread_testcancel(), so it catches the the cancellatiob request. The main
thread calls pthread_join() to join the second thread and the whole program
hangs up! If the function which cancel the second thread is called explicitly
(not from the destructor) before the module unloading, the second thread cancels
and joins fine.


Steps to Reproduce:
1) Unpack the attached tarball. It is the trimmed-down testcase
of the actual big application.
2) Run "./compile" to compile the test program and the module.
3) Run "./run" to see messages and the program hangup.
4) Press Ctrl-C to reclaim the command prompt.
5) Run "./test ./libmodule.so foo" to see a normal program behaviour
in case of explicit thread cancelling.


Actual Results:
1) Output of running "./run" or "./test ./libmodule.so".
---
$ ./run
loading ./libtestmod.so now
Constructor called
hi there, new thread is up and running, thread id is -1210377296
Constructor finished
pureShutdown::func(void*) called
= thread -1210377296 is still running...
= thread -1210377296 is still running...
= thread -1210377296 is still running...
= thread -1210377296 is still running...
unloading ./libtestmod.so now
Destructor called
modShutdown() called
bye, cancelling down thread -1210377296
running pthread_join(g_tid, &result) ...
---
(the program hangs here)

2) Output of running "./test ./libmodule.so foo".
---
$ ./test ./libtestmod.so foo
loading ./libtestmod.so now
Constructor called
hi there, new thread is up and running, thread id is -1210377296
Constructor finished
pureShutdown::func(void*) called
= thread -1210377296 is still running...
= thread -1210377296 is still running...
= thread -1210377296 is still running...
= thread -1210377296 is still running...
modShutdown() called
bye, cancelling down thread -1210377296
running pthread_join(g_tid, &result) ...
returned from pthread_join(g_tid, &result) !
all's well that end's well
modShutdown() finished
unloading ./libtestmod.so now
Destructor called
modShutdown() called
Destructor finished
---
(the program exits with code 0 here)

3) GDB session of the first case (running "./run" or "./test ./libmodule.so").
---
$ gdb
GNU gdb 6.1.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
(gdb) file ./test
Reading symbols from ./test...done.
Using host libthread_db library "/lib/libthread_db.so.1".
(gdb) run ./libtestmod.so
Starting program: /home/ses/test/test ./libtestmod.so
[Thread debugging using libthread_db enabled]
[New Thread -1210374480 (LWP 2594)]
loading ./libtestmod.so now
Constructor called
[New Thread -1210377296 (LWP 2597)]
hi there, new thread is up and running, thread id is -1210377296
Constructor finished
pureShutdown::func(void*) called
= thread -1210377296 is still running...
= thread -1210377296 is still running...
= thread -1210377296 is still running...
= thread -1210377296 is still running...
unloading ./libtestmod.so now
Destructor called
modShutdown() called
bye, cancelling down thread -1210377296
running pthread_join(g_tid, &result) ...

Program received signal SIG32, Real-time event 32.
[Switching to Thread -1210377296 (LWP 2597)]
0xffffe410 in ?? ()
(gdb) bt
#0  0xffffe410 in ?? ()
#1  0xb7db1468 in ?? ()
#2  0xb7fd6ff8 in ?? () from /lib/libpthread.so.0
#3  0x00000000 in ?? ()
#4  0xb7fd2cf6 in __nanosleep_nocancel () from /lib/libpthread.so.0
#5  0xb7fe3ddd in pureShutdown::func () at module.cpp:71
#6  0xb7fcd3c0 in start_thread () from /lib/libpthread.so.0
#7  0xb7e6c24e in clone () from /lib/libc.so.6
(gdb) kill
Kill the program being debugged? (y or n) y
(gdb) quit
$
---
kill -l haven't print what the SIG32 is. Google said that it is SIGTRAP.


Expected Results: the program in the first case should not hang up, but the
second thread should terminate correctly, the module should be unloaded
correctly and the whole program should exit with code 0.


Build Date: 2004-01-12


System information:
Processor: Pentium III (Coppermine) 667.080 Mhz
Distribuition: Linux From Scratch 6.0 with RPM and some packages updated
Kernel version: 2.6.9, unpatched AFAIK

Glibc version: 
Snapshot of 2005-01-10 from
ftp://sources.redhat.com/pub/glibc/snapshots/glibc-20050110.tar.bz2:
---
GNU C Library development release version 2.3.90, by Roland McGrath et al.
Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 3.4.3.
Compiled on a Linux 2.6.9 system on 2005-01-11.
Available extensions:
        GNU libio by Per Bothner
        crypt add-on version 2.1 by Michael Glad and others
        Native POSIX Threads Library by Ulrich Drepper et al
        BIND-8.2.3-T5B
        NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk
Thread-local storage support included.
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.
---
Sorry for not trying the latest CVS. I haven't got an access to the outside
network CVS from my corporate network. And judjing from
[glibc]/libc/nptl/ChangeLog on CvsWeb, nothing changed during the last 2 days in
the nptl.

Glibc "./configure" switches (excluding "--*dir=" switches):
---
    --disable-profile \
    --enable-add-ons=nptl \
    --with-tls \
    --with-__thread \
    --enable-kernel=2.6.9 \
    --without-cvs \
    --with-headers=/usr/src/linux-2.6.9/include
---
Glibc was built into the rpm packages with the aid of rpm.

GCC version:
---
$ gcc -v
Reading specs from /usr/lib/gcc/i686-pc-linux-gnu/3.4.3/specs
Configured with: ../gcc-3.4.3/configure --host=i686-pc-linux-gnu
--build=i686-pc-linux-gnu --target=i686-pc-linux-gnu --prefix=/usr
--exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc
--datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib
--libexecdir=/usr/lib --localstatedir=/var --sharedstatedir=/usr/com
--mandir=/usr/share/man --infodir=/usr/share/info --enable-shared
--enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu
--enable-languages=c,c++
Thread model: posix
gcc version 3.4.3
---

Ld/Binutils version:
---
$ ld -v
GNU ld version 2.15.91.0.1 20040527
---

Hoping, all the provided information will help. If you need more info - please
feel free to ask. Also feel free to request additional testing/investigation.
And also an advice would be helpful how to write the patch myself.

-- 
           Summary: Cancelling nptl thread on dlclose() leads to application
                    hangup
           Product: glibc
           Version: 2.3.4
            Status: NEW
          Severity: critical
          Priority: P2
         Component: nptl
        AssignedTo: drepper at redhat dot com
        ReportedBy: alexei dot khlebnikov at datacon dot at
                CC: glibc-bugs at sources dot redhat dot com
 GCC build triplet: i686-pc-linux-gnu
  GCC host triplet: i686-pc-linux-gnu
GCC target triplet: i686-pc-linux-gnu


http://sources.redhat.com/bugzilla/show_bug.cgi?id=654

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]