This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] pthread_once hangs when init routine throws an exception [BZ #18435]


* Martin Sebor <msebor@gmail.com> [2015-07-08 15:28:23 -0600]:
> >This patch has serious problems which cause regressions on at
> >least aarch64 and possibly other arches.
> 
> I finally got an Aarch64 box, managed to reproduce one of the two
> reported test regressions (the one in tst-join5; the other test
> passes for me) and have been debugging it in between other tasks.
> 

we know the root cause already (you were somehow left
off from the cc at some point).

here is my analysis and carlos' reply:
http://sourceware.org/ml/libc-alpha/2015-07/msg00260.html

> I'm not sure I understand what you mean here. The patch doesn't
> introduce any assumptions that didn't exist before. Callers of
> the cancellation functions don't depend on -fasynchronous-unwind-
> tables: only the functions themselves do (when __EXCEPTIONS is
> defined), and they are being compiled that way.

the new requirement is to compile pthread_once's
callback argument with async unwind info.

> >Worse is that as the compiler moves around the asm cancellation
> >wrapper for the syscall outside of the cleanup region because
> >the compiler assumes asm can never raise exceptions. This is
> >the more serious issue that needs addressing.
> 
> The asm is declared volatile memory so the compiler shouldn't
> reorder it with other statements that perform memory accesses.
> 
> But the problem does appear to be sensitive to inlining in
> pthread_join.c. When I outline the call to lll_wait_tid the
> problem disappears. But when comparing the assembly between
> the two versions of the file I don't see the system call being
> moved past the cleanup call (the cleanup, when outlined, is
> the second to last call in the function, just before the one
> to _Unwind_Resume). The call just doesn't take place. I need
> to study the assembly in more detail to understand exactly
> where the problem is.

the problem is that -fexceptions is not 'async unwind safe'

the cleanup handler is only guaranteed to run if the unwind
goes through extern functions (that may throw).

there is a proposed new cancellation design that gets rid
of async cancel + inline asm syscalls + cleanup handlers,
with that your patch would be safe, but without it, it isnt.
(that change is scheduled for 2.23)

> >Please revert this patch. We need to look at another solution
> >that doesn't regress any tests.
> 
> I've seen your note about getting ready for the 2.22 release
> so I wouldn't want to jeopardize the schedule. But if there's
> some slack (say a few days) I would like to get to the bottom
> of this and try to resolve the problem without reverting the
> patch. But if it's urgent, I will certainly revert the patch
> and work on fixing this for 2.23.
> 
> Martin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]