This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: exception handling predicament


On Fri, Aug 19, 2011 at 1:08 AM, David Miller <davem@davemloft.net> wrote:
>
> Please read, it took me a very long time to debug this :-/
>
> I think this issue applies to most targets in glibc. ?We've been
> mostly getting away with this simply because gcc has been less
> aggressive optimizing exception regions in the past.
>
> If given -fexceptions, GCC will not recognize an inline asm as
> potentially generating an exception unless both of the following
> are true:
>
> 1) One of the asm operands has a type which is volatile
> 2) -fnon-call-exceptions is given in CFLAGS
>
> This can therefore cause a problem on any platform that implements the
> lowlevellock.h futex operations as inline asm syscalls (i386, x86_64,
> sparc, etc.)
>
> Initially I thought only #1 was the issue, so I reworked the sparc
> lowlevellock.h inline asms such that the volatile types propagate
> properly into the inline asms instead of being casted away.
>
> But it turns out #2 is also needed.
>
> I haven't checked but I imagine this could cause problems in other
> cancellable routines where the exception generating point is an
> inlined syscall and we've enabled async cancellation.
>
> One test case that fails because of this issue is nptl/tst-cancel17.c
> because aio_suspend() has this code sequence involving a cleanup which
> gets implemented using __attribute__((__cleanup__(xxx))):
>
> ? ? ?pthread_cleanup_push (cleanup, &clparam);
>
> #ifdef DONT_NEED_AIO_MISC_COND
> ? ? ?AIO_MISC_WAIT (result, cntr, timeout, 1);
> #else
> ?...
> #endif
>
> ? ? ?pthread_cleanup_pop (0);
>
> AIO_MISC_WAIT() is essentially:
>
> ? ? ? ? ?oldtype = LIBC_CANCEL_ASYNC ();
> ?...
> ? ? ? ?pthread_mutex_unlock (&__aio_requests_mutex);
> ?...
> ? ? ? ? ? ?status = lll_futex_timed_wait (futexaddr, oldval, timeout,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? LLL_PRIVATE);
> ?...
> ? ? ? ?if (cancel)
> ? ? ? ? ?LIBC_CANCEL_RESET (oldtype);
> ?...
> ? ? ? ?pthread_mutex_lock (&__aio_requests_mutex);
>
> GCC decides that the cleanup exception range should only cover the
> LIBC_CANCEL_ASYNC() and LIBC_CANCEL_RESET(), because they evaluate to
> function calls which are not marked as __nothrow__.
>
> It should not cover pthread_mutex_{unlock,lock}() because those
> functions have been marked as __nothrow__.
>
> It should also not cover lll_futex_timed_wait() because that's an
> inline asm and we haven't passed -fnon-call-exceptions to GCC.
>
> The result is that gcc does not emit an exception region for
> lll_futex_timed_wait()'s asm, and therefore if the cancel event comes
> in while we're sleeping on that futex call then the aio_suspend()
> cleanups do not run and therefore we eventually crash.
>
> We could pass -fnon-call-exceptions but that seems pretty heavy handed,
> and doesn't actually fix the real problem.
>
> The truth is that __cleanup__ doesn't provide the semantics we want.
>
> The cancel signal (and thus since we're in async mode, the unwind) can
> occur at any instruction in this code sequence. ?Not just instructions
> that "might trap"
>
> They all "might trap." ?It could even happen during one of the
> __nothrow__ functions we call.
>
> So perhaps __cleanup__ is not appropriate for async signal based
> exceptions, as is being used here. ?And we should instead use some
> other cleanup mechanism.
>
> As far as I can tell, aio_suspend() is the only part of librt that
> tries to make use of a pthread cleanup.
>

Is this related to:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48338


-- 
H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]