This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[RFC] handling unexpected return values from kernel
- From: Sripathi Kodi <sripathik at in dot ibm dot com>
- To: libc-alpha at sourceware dot org
- Cc: Jakub Jelinek <jakub at redhat dot com>, Ulrich Drepper <drepper at redhat dot com>, Dinakar Guniguntala <dino at in dot ibm dot com>
- Date: Wed, 6 Jun 2007 22:38:01 +0530
- Subject: [RFC] handling unexpected return values from kernel
Hi,
When __pthread_mutex_lock() makes sys_futex(FUTEX_LOCK_PI) call to the kernel,
it expects that the call succeeds or the kernel returns ESRCH or EDEADLK.
Similarly, when __pthread_mutex_unlock, when it makes
sys_futex(FUTEX_UNLOCK_PI) call, it assumes that the syscall succeeds.
While using preempt-rt kernels, we came across a bug in the kernel because of
which sys_futex(FUTEX_UNLOCK_PI) returned EFAULT. However, glibc assumed that
it had unlocked the mutex without checking the return value. This resulted in
application deadlock a bit later and it was very hard to diagnose what was
going wrong. Similarly, in the past we have seen sys_futex(FUTEX_LOCK_PI)
returning EINTR and that too was hard to diagnose.
I am thinking whether the current glibc behavior, though it is technically
correct, is the best way to do it. The alternatives I can think of are:
1) Return the unexpected error to the application: This is NOT an option
because we can't return unexpected errors to application.
2) glibc can catch unexpected errors and re-issue the syscall. We have seen
that the syscall usually succeeds on the second time. We have used such a
patch from Dinakar Guniguntala in the past. However, this will result in
masking a possible kernel bug and could lead to sub-optimal performance of
mutex_lock/unlock calls. glibc could print an error message before re-issuing
the syscall, to draw attention of user.
3) glibc could abort the application if it receives unexpected return values
from the kernel.
In any case, silently ignoring the errors returned by the kernel and assuming
the lock is locked/unlocked does not seem to be the right thing to do. What
are your thoughts on this?
Thanks,
Sripathi.