This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Fixing pthread cancellation


Hi,

As it stands, thread cancellation is almost useless on glibc/NPTL due
to race conditions that can lead to file descriptor leaks or
double-close errors (which are very dangerous in multi-threaded
programs since they can result in a file descriptor just obtained by
another thread being closed, causing io to take place on the wrong
files).

I have a bug report on the tracker that's been open for some time now:

http://sourceware.org/bugzilla/show_bug.cgi?id=12683

as well as a new article on my blog about the issues surrounding it:

http://ewontfix.com/2/

and a defect report on the Austin Group tracker for one aspect of the
issue that was underspecified:

http://austingroupbugs.net/view.php?id=614

although the latter has already been addressed by fixing the
requirements on close() as the accepted action on an earlier defect
report.

With regards to fixing the issue, I tried several approaches in musl
libc before settling on what I believe to be the best one: What we're
doing now is having the signal handler that processes cancellation
inspect the saved program counter register (in the ucontext_t) and
compare it against a range of labels in the assembly-language function
that makes the syscall:

label1:	[check cancel flag, branch if set]
	[make syscall]
label2:

If the program counter is between these two labels, cancellation can
be acted upon by the signal handler; if not, it must be deferred.
Also, since some syscalls always return with EINTR when interrupted,
the signal handler would see the program counter at label2, not
before it. In this case, the signal handler cannot handle the
cancellation; instead, the callee inspects the return value and if
it's -EINTR, it checks the cancellation flag and acts on it if present
(except in the case of close(), which is special due to Linux not
honoring POSIX semantics for its behavior on EINTR).

This approach precludes (at least for us) using vdso syscall and
inline syscalls. I consider that a non-issue since cancellation points
are not "fast" syscalls but "slow" syscalls that could go into
interruptable sleep, but glibc could potentially recover the ability
to do inline syscalls (or, with the help of the kernel, even vdso
syscalls) using DWARF annotations to the syscall asm instead of label
ranges. I don't fully understand DWARF, so somebody else would have to
figure out if/how this is possible.

Again, these costs only affect cancellation point syscalls which are
slow anyway, not all syscalls.

Is there any interest in fixing this issue?

Rich


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]