This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug nptl/21374] __pthread_cond_destroy deadlock on glibc 2.25
- From: "adhemerval.zanella at linaro dot org" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Wed, 12 Apr 2017 19:17:55 +0000
- Subject: [Bug nptl/21374] __pthread_cond_destroy deadlock on glibc 2.25
- Auto-submitted: auto-generated
- References: <bug-21374-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=21374
Adhemerval Zanella <adhemerval.zanella at linaro dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |adhemerval.zanella at linaro dot o
| |rg
--- Comment #4 from Adhemerval Zanella <adhemerval.zanella at linaro dot org> ---
It seems exactly what the it is trying to do based on the example provided [1].
Using master and showing the backtrace of all threads:
Thread 4 (LWP 27793):
#0 0x00007ffff78e3683 in futex_wait_cancelable (private=<optimized out>,
expected=0, futex_word=0x555556211808) at
../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x555556211810,
cond=0x5555562117e0) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x5555562117e0, mutex=0x555556211810) at
pthread_cond_wait.c:655
#3 0x00007fffed148c7c in
std::condition_variable::wait(std::unique_lock<std::mutex>&) () from
/home/azanella/anaconda3/lib/python3.6/site-packages/torch/lib/libshm.so
#4 0x00007fffed902b53 in
std::condition_variable::wait<torch::autograd::ReadyQueue::pop_back()::__lambda0>
(__p=..., __lock=..., this=0x5555562117e0)
from
/home/azanella/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so
#5 torch::autograd::ReadyQueue::pop_back (this=this@entry=0x555556211790) at
torch/csrc/autograd/engine.cpp:80
#6 0x00007fffed904d23 in torch::autograd::Engine::thread_main
(this=this@entry=0x7fffee17ed00 <engine>, queue=...)
from
/home/azanella/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so
#7 0x00007fffed91589a in PythonEngine::thread_main (this=0x7fffee17ed00
<engine>, queue=...)
from
/home/azanella/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so
#8 0x00007fffd1e1b870 in ?? () from
/home/azanella/anaconda3/lib/python3.6/site-packages/torch/lib/../../../../libstdc++.so.6
#9 0x00007ffff78dd455 in start_thread (arg=0x7fffc4456700) at
pthread_create.c:455
#10 0x00007ffff6cd3e5f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:97
Thread 3 (LWP 27792):
#0 0x00007ffff78e3683 in futex_wait_cancelable (private=<optimized out>,
expected=0, futex_word=0x55555621132c) at
../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x555556211330,
cond=0x555556211300) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x555556211300, mutex=0x555556211330) at
pthread_cond_wait.c:655
#3 0x00007fffed148c7c in
std::condition_variable::wait(std::unique_lock<std::mutex>&) () from
/home/azanella/anaconda3/lib/python3.6/site-packages/torch/lib/libshm.so
#4 0x00007fffed902b53 in
std::condition_variable::wait<torch::autograd::ReadyQueue::pop_back()::__lambda0>
(__p=..., __lock=..., this=0x555556211300)
from
/home/azanella/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so
#5 torch::autograd::ReadyQueue::pop_back (this=this@entry=0x5555562112b0) at
torch/csrc/autograd/engine.cpp:80
#6 0x00007fffed904d23 in torch::autograd::Engine::thread_main
(this=this@entry=0x7fffee17ed00 <engine>, queue=...)
from
/home/azanella/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so
#7 0x00007fffed91589a in PythonEngine::thread_main (this=0x7fffee17ed00
<engine>, queue=...)
from
/home/azanella/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so
#8 0x00007fffd1e1b870 in ?? () from
/home/azanella/anaconda3/lib/python3.6/site-packages/torch/lib/../../../../libstdc++.so.6
#9 0x00007ffff78dd455 in start_thread (arg=0x7fffc4c57700) at
pthread_create.c:455
#10 0x00007ffff6cd3e5f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:97
Thread 2 (LWP 27791):
#0 0x00007ffff6cd53f8 in accept4 (fd=9, addr=..., addr_len=0x7fffc5457e58,
flags=524288) at ../sysdeps/unix/sysv/linux/accept4.c:40
#1 0x00007fffc5863496 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007fffc5856cbd in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007fffc5863e88 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff78dd455 in start_thread (arg=0x7fffc5458700) at
pthread_create.c:455
#5 0x00007ffff6cd3e5f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:97
Thread 1 (LWP 26868):
#0 0x00007ffff78e31c9 in futex_wait (private=<optimized out>, expected=12,
futex_word=0x555556211324) at ../sysdeps/unix/sysv/linux/futex-internal.h:61
#1 futex_wait_simple (private=<optimized out>, expected=12,
futex_word=0x555556211324) at ../sysdeps/nptl/futex-internal.h:135
#2 __pthread_cond_destroy (cond=0x555556211300) at pthread_cond_destroy.c:54
#3 0x00007fffed90175e in torch::autograd::ReadyQueue::~ReadyQueue
(this=0x5555562112b0, __in_chrg=<optimized out>)
from
/home/azanella/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so
#4 std::default_delete<torch::autograd::ReadyQueue>::operator()
(this=<optimized out>, __ptr=0x5555562112b0) at
torch/csrc/autograd/engine.cpp:67
It looks like thread 3 and thread 4 are both waiting on a condition variable
and thread 1 call pthread_cond_destroy on it. Also based on the bug report
pytorch referenced bug, it indeed looks like an application issue [2].
If you have well defined example which trigger this very issue it would be
helpful, otherwise debug indicates you are relying on undefined behavior and I
will close this bug.
[1]
https://discuss.pytorch.org/t/archlinux-using-variable-backwards-appears-to-hang-program-indefinitely/1675
[2] https://github.com/pytorch/pytorch/pull/1243
--
You are receiving this mail because:
You are on the CC list for the bug.