This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] pthread_once hangs when init routine throws an exception [BZ #18435]
- From: Szabolcs Nagy <nsz at port70 dot net>
- To: Martin Sebor <msebor at gmail dot com>
- Cc: Carlos O'Donell <carlos at redhat dot com>, Martin Sebor <msebor at redhat dot com>, GNU C Library <libc-alpha at sourceware dot org>, Szabolcs Nagy <szabolcs dot nagy at arm dot com>
- Date: Thu, 9 Jul 2015 00:13:40 +0200
- Subject: Re: [PATCH] pthread_once hangs when init routine throws an exception [BZ #18435]
- Authentication-results: sourceware.org; auth=none
- References: <556B7F10 dot 40209 at redhat dot com> <559D4CD6 dot 5070607 at redhat dot com> <559D95F7 dot 902 at gmail dot com>
* Martin Sebor <msebor@gmail.com> [2015-07-08 15:28:23 -0600]:
> >This patch has serious problems which cause regressions on at
> >least aarch64 and possibly other arches.
>
> I finally got an Aarch64 box, managed to reproduce one of the two
> reported test regressions (the one in tst-join5; the other test
> passes for me) and have been debugging it in between other tasks.
>
we know the root cause already (you were somehow left
off from the cc at some point).
here is my analysis and carlos' reply:
http://sourceware.org/ml/libc-alpha/2015-07/msg00260.html
> I'm not sure I understand what you mean here. The patch doesn't
> introduce any assumptions that didn't exist before. Callers of
> the cancellation functions don't depend on -fasynchronous-unwind-
> tables: only the functions themselves do (when __EXCEPTIONS is
> defined), and they are being compiled that way.
the new requirement is to compile pthread_once's
callback argument with async unwind info.
> >Worse is that as the compiler moves around the asm cancellation
> >wrapper for the syscall outside of the cleanup region because
> >the compiler assumes asm can never raise exceptions. This is
> >the more serious issue that needs addressing.
>
> The asm is declared volatile memory so the compiler shouldn't
> reorder it with other statements that perform memory accesses.
>
> But the problem does appear to be sensitive to inlining in
> pthread_join.c. When I outline the call to lll_wait_tid the
> problem disappears. But when comparing the assembly between
> the two versions of the file I don't see the system call being
> moved past the cleanup call (the cleanup, when outlined, is
> the second to last call in the function, just before the one
> to _Unwind_Resume). The call just doesn't take place. I need
> to study the assembly in more detail to understand exactly
> where the problem is.
the problem is that -fexceptions is not 'async unwind safe'
the cleanup handler is only guaranteed to run if the unwind
goes through extern functions (that may throw).
there is a proposed new cancellation design that gets rid
of async cancel + inline asm syscalls + cleanup handlers,
with that your patch would be safe, but without it, it isnt.
(that change is scheduled for 2.23)
> >Please revert this patch. We need to look at another solution
> >that doesn't regress any tests.
>
> I've seen your note about getting ready for the 2.22 release
> so I wouldn't want to jeopardize the schedule. But if there's
> some slack (say a few days) I would like to get to the bottom
> of this and try to resolve the problem without reverting the
> patch. But if it's urgent, I will certainly revert the patch
> and work on fixing this for 2.23.
>
> Martin