This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
[Bug dyninst/15443] New: deal with mutatees that die during our handlers
- From: "jistone at redhat dot com" <sourceware-bugzilla at sourceware dot org>
- To: systemtap at sourceware dot org
- Date: Tue, 07 May 2013 19:40:09 +0000
- Subject: [Bug dyninst/15443] New: deal with mutatees that die during our handlers
- Auto-submitted: auto-generated
http://sourceware.org/bugzilla/show_bug.cgi?id=15443
Bug #: 15443
Summary: deal with mutatees that die during our handlers
Product: systemtap
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: dyninst
AssignedTo: systemtap@sourceware.org
ReportedBy: jistone@redhat.com
Classification: Unclassified
We can assume for a moment that our runtime is perfect, and never causes the
mutatee to die. But what happens if a threaded mutatee exits (by signal or by
choice) or execs, while one of its threads is currently in one of our probe
handlers? I expect at a minimum, that context mutex will be left forever
locked. It's possible for much more to be left in inconsistent state too.
(I've been trying to debug some weird issues during testsuite runs, and while
I'm not certain this is the root cause, it does seem to be a real possibility.)
Maybe we could try to capture all exit/exec paths and "quiesce" other threads
(at least as far as our state is concerned). I suspect that this would require
heroic effort though, and still probably imperfect. (e.g. SIGKILL is absolute.)
For mutexes, there is pthread_mutexattr_setrobust() which we should probably
use. This will at least tell us EOWNERDEAD, and from there we can decide
whether recovery is possible. That decision is probably different for each
mutex-locked area we have, e.g. a dead lock on a context struct can probably be
repurposed, but a dead lock on the transport seems worse. But even handling
EOWNERDEAD as a fatal error would be better than just hanging.
For rwlock, I see no equivalent of setrobust(). These are used for global
variables, so we should probably just add timeouts. (Not a trylock-wait-retry
loop as in kernel - I think just a plain timed[rd|wr]lock is fine.)
--
Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.