This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] pthread_once hangs when init routine throws an exception [BZ #18435]


On 06/07/15 18:08, Szabolcs Nagy wrote:
> On 06/07/15 17:33, Szabolcs Nagy wrote:
>> On 06/07/15 15:58, Adhemerval Zanella wrote:
>>> On 06-07-2015 11:16, Martin Sebor wrote:
>>>>> this broke
>>>>>
>>>>> nptl/tst-join5
>>>>> nptl/tst-once3
>>>>>
>>>>> tests on aarch64.
>>>>>
>>>>> the cleanup handler of the pthread_once and pthread_join
>>>>> implementation don't run when they are canceled.
>>>>
>>>> I'll look into it as soon as I get access to an aarch64 machine.
>>>>
>>>> Martin
>>>>
>>>
>>> And I see a regression with
>>>
>>> nptl/tst-once3
>>>
>>> for armhf.
>>>
>>
>> in case of aarch64 the bug is somewhere in __pthread_unwind
>> (called from __do_cancel) so probably a libgcc issue.
>>
> 
> the problem seems to be that gcc on x86_64 turns on
> -fasynchronous-unwind-tables by default, but not on
> aarch64 or arm.
> 
> now i added -fasynchronous-unwind-tables to the cflags
> of the relevant tests, will send a patch if they pass.
> 

This uncovered a serious issue that affects other archs too.

Both test failures are caused by glibc switching the internal
mechanism of pthread cancellation clean up handling to use
__attribute__((cleanup(f))) and -fexceptions, but the two test
failures are independent:

(1) Should -fasynchronous-unwind-tables be on by default in gcc?

nptl/tst-once3 fails because the callback passed to pthread_once
now has to be compiled with -fasynchronous-unwind-tables which
is not on by default on arm and aarch64 gcc.  So does glibc
expect the users to use this flag correctly or does glibc
requires the compiler to have it on by default?

(My understanding: posix conforming c code cannot observe the
presence of -fasynchronous-unwind-tables without invoking UB, but
the glibc implementation of cancellation cleanup and backtrace
from signal handlers makes this detail observable.  Any function
which may be canceled needs this flag to make cleanup work, so
glibc seems to impose this as a requirement on the compiler: the
user may not be in control of all the code that may be canceled).


(2) Should gcc support exceptions from async signal handlers?

nptl/tst-join5 failure is more problematic: it fails because gcc
does not seem to implement -fexceptions with the assumption that
signal handlers can throw, in particular it assumes inline asm
does not throw exceptions.  If the syscall that is a cancellation
point appears between pthread_cleanup_push and pthread_cleanup_pop
in glibc internal code, the cleanup handler may not get run on
cancellation depending on where gcc moved the syscall inline asm.
(It is free to move it outside the code range that is marked for
exception handling, this is what happens on aarch64 in pthread_join).
This affects all archs, but some may get lucky.

(My understanding: gcc must be very strict about how it marks
the code range for exception handling and assume any instruction
may throw if it wants -fexceptions -fasynchronous-unwind-tables to
work from signal handlers.  Current compilers do not seem to support
this so glibc internal code should not rely on it, which means the
cancellation mechanism should not rely on exception handling at
least not when the exception is thrown from the cancel signal
handler.  I think the gnu toolchain should not try to make pthread
cancellation to interoperate with C++ exceptions nor to make
exceptions work from signal handlers: no standard requires this
behaviour and seems to cause problems).


Both issues cause silent omission of cleanup handlers running
on cancellation, leaving libc internal state inconsistent.

The second issue may be worth discussing on the gcc list.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]