This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Thread-local support for non-POD data objects


On Thu, Aug 01, 2013 at 08:39:01AM +0530, Siddhesh Poyarekar wrote:
> On 25 July 2013 06:38, Rich Felker <dalias@aerifal.cx> wrote:
> > I have a couple questions about the thread-local dtor support. The
> 
> Sorry it took me so long to respond to this; I was tied up in a bunch
> of internal deadlines.

No problem.

> > current implementation of __cxa_thread_atexit_impl does not check the
> > return value of calloc, and crashes if calloc failed. The
> > __cxa_thread_atexit_impl function, however, does have a return value.
> 
> That's a bug.  We need to check for calloc failure and return -1.

OK, but I don't think that fixes the problem, it just moves it...

> > I was not able to find the code in libstdc++ that calls it (maybe it's
> > not there yet), but does this calling code check the return value, and
> > would it instead be possible for __cxa_thread_atexit_impl to return
> > failure and for the caller to throw an exception in this case? And
> > would this behavior even be conforming?
> 
> libstdc++v3/libsupc++/atexit_thread.cc has this support implemented.
> It implements __cxxabiv1::__cxa_thread_atexit, which calls
> __cxa_thread_atexit_impl if it was made available during configure.
> __cxa_thread_atexit does return failure, so you can have
> __cxa_thread_atexit return failure too.
> 
> If you're looking for the point where the compiler inserts then look
> at gcc/cp/decl.c.  The semantics are more or less identical to
> cxa_atexit, which also returns failure when it is unable to allocate
> memory for a new exit function node.  However, the compiler only
> inserts a node to call the function - there is no check for return
> value.

Lovely...

Since this is a new feature, my feeling is we should try to get it
right. There should be some way for the dynamic linker to know the
maximum number of __cxa_thread_atexit calls the library will make so
it can reserve space for them. Fixing plain __cxa_atexit can be a
separate issue.

> > My feeling is that it should be impossible for code like this to crash
> > or generate an exception:
> >
> >   thread_local int foo = gettid(); // stupid example
> >
> > but this is impossible unless storage for the TLS dtor information is
> > reserved prior.
> >
> > Maybe it's too late to redesign all this, but I think the C++ compiler
> > should generate extra TLS space for non-POD TLS, sufficient to contain
> > a structure like:
> >
> >   struct nonpod_tls_desc {
> >     void *objl
> >     void (*dtor)(void *);
> >     struct nonpod_tls_desc *next;
> >   };
> >
> > then __cxa_thread_atexit would merely need to link these descriptors,
> > and would not have to allocate anything. And code like the above would
> > have zero risk of failure.
> >
> > If it is too late to change things, in musl I'm probably going to have
> > to implement something like the above without the help of the
> > compiler, by having the dynamic linker automatically allocate extra
> > TLS like the above whenever it detects a C++ DSO with non-POD TLS
> > objects. Part of the reason I'm raising these issues are as part of
> > figuring out whart musl will need to do to handle C++ non-POD TLS.
> 
> The issue of checking failure is not limited to thread_local objects.
> Regular static/global objects that are registered for destruction
> using __cxa_atexit also have a similar issue where __cxa_atexit
> returns a failure, but the compiler doesn't register these failures.

In the absence of dlopen, this is a lesser issue, since the failure
will just happen at program startup, presumably before you have any
valuable data to lose. TLS ctor failure, on the other hand, can happen
at any time while the program is running, so it's a much bigger
problem, in my opinion.

> Maybe the right approach here would be to have the compiler generate
> extra stace in .bss (or .tbss for thread_local) so that any failure to
> do such allocation would be handled by the dynamic linker.  I haven't

This would definitely be the right approach.

> thought this through, but it may have enough ABI implications that it
> would be better to just generate some simple failure code in the
> compiler instead, on the lines of:
> 
> if (__cxa_atexit (...))
>   {
>     __print_failure ();
>     abort ();
>   }

I would say the compiler should throw an exception rather than
aborting the program, but that's not right either since exceptions
should not be travelling across C/C++ boundaries (dlopen). Really,
__cxa_atexit should just be implemented so as not to be able to fail,
and the compiler should (for efficiency) assume it doesn't fail.

How many calls to __cxa_atexit does the compiler generate? Offhand it
looks like it generates one per object, which is really ugly and
inefficient; there's no reason for it to be more than one per
translation unit, and it could really be just one per dso with some
toolchain help, or even better, none at all, using fini_array instead.

Short of that, I have a solution for reserving storage, but it's ugly.
Upon loading a C++ library, allocate 8 or 16 times its writable map
size (1 byte minimum object size, 2 words per atexit entry) and allow
__cxa_atexit to use that on malloc failure if the dso handle matches.
If none of it has been used (all registrations fit in the normal
atexit list), free it once the init sections have been run.

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]