This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

pthread_mutex_lock hang during tls_get_addr_tail()


Hi all. I have a weird issue and I wanted to see if anyone has any
thoughts.

On all the systems I've tried my code with it works fine (this is an
extensively tested codbase).  However, one of my users is using CentOS
6.5 with glibc 2.12-1.166.el6.x86_64 installed, and they are seeing a
hang in pthread_mutex_lock() during a call to __tls_get_addr().

Specifically, I have a shared library written in C++ (GCC 4.9.2) and the
call is from the STL's __cxa_get_globals() function.  Here's a
stacktrace:

Thread 21 (Thread 0x7f0061c53700 (LWP 5295)):
#0  0x0000003f3e4094d1 in pthread_mutex_lock () from /lib64/libpthread.s.0
#1  0x0000003f3dc110f7 in tls_get_addr_tail () from /lib64/ld-linux-x86-64.so.2
#2  0x0000003f3dc11500 in __tls_get_addr () from /lib64/ld-linux-x86-64.so.2
#3  0x00007f0059679b9c in __cxa_get_globals () from /usr/local/lib64/libmylib.so
#4  0x00007f0058cc4c47 in UncaughtExceptionCounter::getUncaughtExceptionCount (this=0x7f0061c50ce4)
   ...

I looked at the implementation of __cxa_get_globals() and it only
returns the address of a static __thread variable:

  get_global() _GLIBCXX_NOTHROW
  {
    static __thread
abi::__cxa_eh_globals global;
    return &global;
  }

  extern "C" __cxa_eh_globals*
  __cxxabiv1::__cxa_get_globals() _GLIBCXX_NOTHROW
  { return get_global(); }

More details: this environment is actually using a Java 1.8 JVM which is
loading my .so and using JNI to access it.  The hang doesn't happen on
the first call to these functions, but it happens "pretty soon".

I've loaded a CentOS 6.5 system in a QEMU VM and tried to reproduce it
with the default glibc there (2.12-1.132) and can't reproduce the hang.
 I also upgraded to the latest 6.5 glibc (2.12-1.192) and can't
reproduce it there either.  I can't find this exact RPM (1.166) so I
can't test that, so I'm not even sure if it's really a glibc issue or
not.

I guess what I'm wondering is if the above stacktrace and info rings any
bells with anyone or suggests other places to look.  I'm severely
hampered by not being able to repro the problem myself but my user can
do it on their system (which I don't have access to) within a minute or
two, every time.

Cheers!


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]