This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] New condvar implementation that provides stronger ordering guarantees.


On Sun, 2015-02-22 at 17:37 -0500, Rich Felker wrote:
> On Fri, Feb 20, 2015 at 07:18:27PM +0100, Torvald Riegel wrote:
> > +   Limitations:
> > +   * This condvar isn't designed to allow for more than
> > +     WSEQ_THRESHOLD * (1 << (sizeof(GENERATION) * 8 - 1)) calls to
> > +     __pthread_cond_wait.  It probably only suffers from potential ABA issues
> > +     afterwards, but this hasn't been checked nor tested.
> > +   * More than (1 << (sizeof(QUIESCENCE_WAITERS) * 8) -1 concurrent waiters
> > +     are not supported.
> > +   * Beyond what is allowed as errors by POSIX or documented, we can also
> > +     return the following errors:
> > +     * EPERM if MUTEX is a recursive mutex and the caller doesn't own it.
> 
> This is not beyond POSIX; it's explicitly specified as a "shall fail".

http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_cond_wait.html

[EPERM]
        The mutex type is PTHREAD_MUTEX_ERRORCHECK or the mutex is a
        robust mutex, and the current thread does not own the mutex.

POSIX does not seem to allow EPERM for *recursive mutexes*.  Is there an
update that I'm missing?

> > +     * EOWNERDEAD or ENOTRECOVERABLE when using robust mutexes.  Unlike
> > +       for other errors, this can happen when we re-acquire the mutex; this
> > +       isn't allowed by POSIX (which requires all errors to virtually happen
> > +       before we release the mutex or change the condvar state), but there's
> > +       nothing we can do really.
> 
> Likewise these are "shall fail" errors specified by POSIX, and while
> it's not clearly written in the specification, it's clear that they
> only happen on re-locking.

Yes, they are "shall fail".  I also agree that POSIX *should* make it
clear that they can happen after releasing and when acquiring the mutex
again -- but that's not what the spec says:

"Except in the case of [ETIMEDOUT], all these error checks shall act as
if they were performed immediately at the beginning of processing for
the function and shall cause an error return, in effect, prior to
modifying the state of the mutex[...]"

Until these two get clarified in the spec, I consider the comments
correct.  We can certainly extend them and document why we thing this
behavior is The Right Thing To Do.  But we need to document where we
deviate from what the spec states literally.

> One ugly case I don't think you're prepared to handle is when
> EOWNERDEAD happens in the cancellation cleanup path. There's no way
> for the caller to know this error happened and thereby no way for it
> to recover the state and make it consistent before unlocking the
> mutex.

That's true.  There's "XXX" comments on that in the cancellation
handlers in the patch; we need to decide whether we ignore such errors
or abort.  I'm kind of undecided right now what's the most useful
behavior in practice.

> The new cancellation state I'm proposing (PTHREAD_CANCEL_MASKED; see
> http://www.openwall.com/lists/musl/2015/02/22/1 for details) makes it
> possible to solve this problem by refusing to act on cancellation
> (leaving it pending), and instead returning as if by a spurious wake
> occurring just before cancellation, so that the error can be reported.
> 
> > +     * EAGAIN if MUTEX is a recursive mutex and trying to lock it exceeded
> > +       the maximum number of recursive locks.  The caller cannot expect to own
> > +       MUTEX.
> 
> This can't happen since the lock count was already decremented once
> and is only incremented back to its original value. (And in practice
> the original value has to be exactly 1 or other threads could never
> modify the state on which the predicate depends.)

Good point.  I think I must have been thinking about rwlocks for some
reason... :)

> > +     * When using PTHREAD_MUTEX_PP_* mutexes, we can also return all errors
> > +       returned by __pthread_tpp_change_priority.  We will already have
> > +       released the mutex in such cases, so the caller cannot expect to own
> > +       MUTEX.
> 
> This seems correct. This is another case where you would want to be
> able to suppress acting upon cancellation and instead cause a spurious
> wake.
> 
> > +   Other notes:
> > +   * Instead of the normal mutex unlock / lock functions, we use
> > +     __pthread_mutex_unlock_usercnt(m, 0) / __pthread_mutex_cond_lock(m)
> > +     because those will not change the mutex-internal users count, so that it
> > +     can be detected when a condvar is still associated with a particular
> > +     mutex because there is a waiter blocked on this condvar using this mutex.
> 
> I don't follow. What is this users count? Is it the recursive lock
> count or something else? Why do you want to be able to detect that
> it's associated with a cv? Note that it could be associated with more
> than one cv.

I've maintained what the previous code did.  There's this check in
pthread_mutex_destroy.c:
  if ((mutex->__data.__kind & PTHREAD_MUTEX_ROBUST_NORMAL_NP) == 0
      && mutex->__data.__nusers != 0)
    return EBUSY;

That's not required (but undefined behavior is allowed by POSIX for
destroying a mutex that is referenced by a concurrent condvar
operation).

It might be better to remove this check, mostly to avoid unnecessary
writes to the mutex.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]