This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Lazy TLS initialization vs. TCMalloc.
- From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
- To: Florian Weimer <fweimer at redhat dot com>, Carlos O'Donell <carlos at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Cc: <nd at arm dot com>
- Date: Wed, 23 Mar 2016 10:20:43 +0000
- Subject: Re: Lazy TLS initialization vs. TCMalloc.
- Authentication-results: sourceware.org; auth=none
- Nodisclaimer: True
- References: <56F1A3F8 dot 8080606 at redhat dot com> <56F1B008 dot 5030800 at redhat dot com>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
On 22/03/16 20:50, Florian Weimer wrote:
> On 03/22/2016 08:58 PM, Carlos O'Donell wrote:
>> I was asked to comment on Ceph/TCMalloc deadlock problems
>> which appeared to be glibc-related. Given that I stood up
>> and represented the community I'm posting what I wrote
>> here for posterity:
>>
>> http://tracker.ceph.com/issues/13522#note-23
>>
>> The basic problem is that lazy TLS initialization has
>> unknown requirements on the interposing malloc and TCMalloc
>> gets it wrong.
>>
>> I hope that we can find somebody to fix the issue and that
>> consensus remains that dlopen should allocate up front to
>> avoid the issues with lazy initialization.
i think the problem is that user code is running while
a libc internal lock is held which is always bad.
in this case the user code is an interposed malloc which
is special in many ways, but a simple ctor could do the
same: if a ctor uses a lock that is already owned by another
thread which then calls into dlopen/tls init/... that would
deadlock on the dlopen lock too.
so i'd say this is just the dlopen lock issue
https://sourceware.org/bugzilla/show_bug.cgi?id=19448
fixing it might solve the interposition case too,
but the fix is not trivial as e.g. discussed in
https://www.sourceware.org/ml/libc-alpha/2016-01/msg00618.html
>
> This bug is potentially related:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=18524
>
> For TLS-with-destructor access in C++, we call calloc without error
> checking. There is no good way we can report the error. As far as I
> know, GCC does not declare the number of thread-local variables with
> destructors, and a conservative estimate would be fairly large (objects
> can be as small as one byte, and all these objects might need an 8-byte
> destructor pointer).
yes the abi invented by gcc for c++11 tls dtors is broken,
but the normal destructor abi is broken too: it uses atexit
which has to allocate (eventually) and the failure is not
handled so a simple dlopen can crash.
either the number of dtors should be declared or the elf
object could have additional space that is usable by libc
for the atexit linked list.