This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

pthread_mutex_destroy returns EBUSY, but the mutex isn't locked


Hi all, I've got a problem with a mutex that's been bugging me for a
while. On a very specific platform (an Opteron running CentOS with
kernel 2.6.9-55.ELsmp with glibc version 2.3.4-2.36) I have less than
1% of my pthread_mutex_destroy calls fail with the error "EBUSY." The
errors are not repeatable reliably on the same machine, and not
repeatable at all on another platform (a Core 2 Duo running the same
OS but with glibc version 2.3.4-2.25). I have not been able to
recreate the error in a simple testcase. In the real program, the
error affects about four mutexes in completely separate parts of the
program.

I am certain the mutex is not actually locked, since I've started
printing out its contents prior to destroying them. One example is as
follows:

      Mutex 0x920df0: lock=0, count=0, owner=0, nusers=2, kind=2

(The "kind=2" field means this is an error checking mutex, but the
problem occurs for normal mutexes as well; this ensures that I'm not
unlocking a mutex I don't own). As you can see, the futex ("lock") is
zero, as is both the count and owning thread. Only nusers is non-zero;
by far the most common value I see is 1, though I've also seen 2, 5, 6
and 10. I've never seen very large or negative numbers that would
strongly suggest memory corruption, though of course I can't rule this
out. The mutexes are allocated off the heap using plain-vanilla malloc
and free, which I believe should be legal.

I added an assertion that nusers must be zero every time we release a
mutex (using another mutex as a wrapper to ensure that another thread
doesn't grab it suddenly). It always passed, but by the time the same
mutex was destroyed, nusers had mysteriously changed.

If I ignore the error, the program runs to its natural conclusion
(which can take several hours), and always operates correctly. This,
again, does not rule out memory corruption, but it does seem to reduce
the likelihood of it as one might expect the corruption to affect more
than just one aspect of the program.

Barring memory corruption (valgrind and hellgrind haven't found
anything), has anyone ever heard of an issue like this before? Or
should I just be looking harder for corruption?

Many thanks, Adrian


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]