This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug nptl/12683] Race conditions in pthread cancellation


https://sourceware.org/bugzilla/show_bug.cgi?id=12683

--- Comment #11 from Carlos O'Donell <carlos at redhat dot com> ---
Alex Oliva and I talked about this particular issue today.

We believe that an entirely userspace solution is possible without assistance
from the kernel, but it requires signal wrappers.

Signal wrappers are code that execute before and after a signal handler and
does things like save and restore errno (the one use we have for them
currently). The signal wrappers would assist in handling deferred cancellation.

The proposed solution would look like this:

* Stop enabling/disabling asynchronous cancellation around syscalls.

* When a blocking library function who is also a cancellation point is entered
a word in the thread's TCB (call it IN_SYSCALL) is set to the value of the
stack pointer (we assume no further stack adjustments are made before the
function exits). The value of IN_SYSCALL is cleared just before the function
returns. Deferred cancellation is still checked before and after the syscall.

* Add a signal wrapper to all signals that checks to see if IN_SYSCALL == SP
stored in the ucontext_t and if it does it immediately cancels the thread. The
check is done upon entry and exit of the wrapper to reduce cancellation
latency. Just before unwinding the IN_SYCALL value is cleared.

* When a thread starts we install a SIGCANCEL (SIGRTMIN) handler like we did
before, but this handler checks to see if the thread's IN_SYSCALL matches the
SP stored in ucontext_t, indicating that cancellation was requested while
executing in the cancellation region of a blocking syscall (and no other signal
handler executing). In that case the signal handler cancels the thread
immediately. If IN_SYSCALL != SP then another signal handler is running and we
defer the cancellation to the signal wrapper or syscall wrapper. The SIGCANCEL
handler operates as it previously did when asynchronous cancellation was
enabled.

Resolved use cases:

- Cancellation delivered between first instruction of function and IN_SYSCALL
set: Syscall wrapper code will check for cancellation and act upon it.

- Cancellation delivered between IN_SYSCALL set and syscall: The SIGCANCEL
handler will immediately cancel the thread.

- Cancellation delivered between syscall and clearing IN_SYSCALL: The SIGCANCEL
handler will immediately cancel the thread.

- Cancellation delivered between clearing of IN_SYSCALL and function return:
The next cancellation point will act upon the cancellation (still meets POSIX
requirement given escape clause of "The thread is suspended at a cancellation
point and the event for which it is waiting occurs").

- Cancellation delivered and thread stopped at syscall is executing multiple
nested signal handlers and the first signal handler has not checked IN_SYSCALL
yet: Only the first signal delivered will have IN_SYSCALL == SP be true. The
SIGCANCEL handler will do nothing. The first signal handler's wrapper will
detect the cancellation is active and act upon it as it exits (only after all
the other signal handlers have completed).

- Cancellation delivered and thread stopped at syscall is executing multiple
nested signal handlers and the first signal handler is exiting and has already
checked IN_SYSCALL: The syscall will be interrupted and return. The syscall
wrapper will act upon the cancellation request. The goal here is to have the
signal handlers finish executing without interruption.

Unresolved use cases:

- Related to bug 14147 -- Cancellation delivered while thread is blocked on an
async-safe function (in fact it's only executing async-safe functions during
the time a signal can be delivered for this to be valid) and executing a signal
handler that longjmp's out of the function. In this case IN_SYSCALL is still
set to SP and not cleared. If by luck SP ends up the same, and another thread
delivers a cancellation request the SIGCANCEL handler will immediately cancel
the thread even though it was not in a cancellation region.

- What if you are executing fork and someone tries to cancel you?

A potential resolution to the first unresolved use case is to use a cleanup
handler to reset IN_SYSCALL since such a handler is run when longjmp unwinds
the frames. However we then need to consider cancellation during the execution
of the cleanup.

I haven't fully thought through what to do with the forking a multithreaded
program case, but we should try to see if we can make it work.

Note: Setting IN_SYSCALL must be atomic.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]