This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] pthread_once hangs when init routine throws an exception [BZ #18435]
- From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
- To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Cc: Marcus Shawcroft <marcus dot shawcroft at arm dot com>
- Date: Wed, 08 Jul 2015 12:00:50 +0100
- Subject: Re: [PATCH] pthread_once hangs when init routine throws an exception [BZ #18435]
- Authentication-results: sourceware.org; auth=none
- References: <556B7F10 dot 40209 at redhat dot com> <557741C5 dot 5060203 at redhat dot com> <559A8029 dot 1000705 at arm dot com> <559A8DAE dot 9040604 at gmail dot com> <559A9789 dot 3090805 at linaro dot org> <559AADC8 dot 4030409 at arm dot com> <559AB627 dot 2050006 at arm dot com>
On 06/07/15 18:08, Szabolcs Nagy wrote:
> On 06/07/15 17:33, Szabolcs Nagy wrote:
>> On 06/07/15 15:58, Adhemerval Zanella wrote:
>>> On 06-07-2015 11:16, Martin Sebor wrote:
>>>>> this broke
>>>>>
>>>>> nptl/tst-join5
>>>>> nptl/tst-once3
>>>>>
>>>>> tests on aarch64.
>>>>>
>>>>> the cleanup handler of the pthread_once and pthread_join
>>>>> implementation don't run when they are canceled.
>>>>
>>>> I'll look into it as soon as I get access to an aarch64 machine.
>>>>
>>>> Martin
>>>>
>>>
>>> And I see a regression with
>>>
>>> nptl/tst-once3
>>>
>>> for armhf.
>>>
>>
>> in case of aarch64 the bug is somewhere in __pthread_unwind
>> (called from __do_cancel) so probably a libgcc issue.
>>
>
> the problem seems to be that gcc on x86_64 turns on
> -fasynchronous-unwind-tables by default, but not on
> aarch64 or arm.
>
> now i added -fasynchronous-unwind-tables to the cflags
> of the relevant tests, will send a patch if they pass.
>
This uncovered a serious issue that affects other archs too.
Both test failures are caused by glibc switching the internal
mechanism of pthread cancellation clean up handling to use
__attribute__((cleanup(f))) and -fexceptions, but the two test
failures are independent:
(1) Should -fasynchronous-unwind-tables be on by default in gcc?
nptl/tst-once3 fails because the callback passed to pthread_once
now has to be compiled with -fasynchronous-unwind-tables which
is not on by default on arm and aarch64 gcc. So does glibc
expect the users to use this flag correctly or does glibc
requires the compiler to have it on by default?
(My understanding: posix conforming c code cannot observe the
presence of -fasynchronous-unwind-tables without invoking UB, but
the glibc implementation of cancellation cleanup and backtrace
from signal handlers makes this detail observable. Any function
which may be canceled needs this flag to make cleanup work, so
glibc seems to impose this as a requirement on the compiler: the
user may not be in control of all the code that may be canceled).
(2) Should gcc support exceptions from async signal handlers?
nptl/tst-join5 failure is more problematic: it fails because gcc
does not seem to implement -fexceptions with the assumption that
signal handlers can throw, in particular it assumes inline asm
does not throw exceptions. If the syscall that is a cancellation
point appears between pthread_cleanup_push and pthread_cleanup_pop
in glibc internal code, the cleanup handler may not get run on
cancellation depending on where gcc moved the syscall inline asm.
(It is free to move it outside the code range that is marked for
exception handling, this is what happens on aarch64 in pthread_join).
This affects all archs, but some may get lucky.
(My understanding: gcc must be very strict about how it marks
the code range for exception handling and assume any instruction
may throw if it wants -fexceptions -fasynchronous-unwind-tables to
work from signal handlers. Current compilers do not seem to support
this so glibc internal code should not rely on it, which means the
cancellation mechanism should not rely on exception handling at
least not when the exception is thrown from the cancel signal
handler. I think the gnu toolchain should not try to make pthread
cancellation to interoperate with C++ exceptions nor to make
exceptions work from signal handlers: no standard requires this
behaviour and seems to cause problems).
Both issues cause silent omission of cleanup handlers running
on cancellation, leaving libc internal state inconsistent.
The second issue may be worth discussing on the gcc list.