This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/6] Fall back to locking if elision runs out of retries.


On Mon, 2013-09-16 at 14:39 +0200, Dominik Vogt wrote:
> On Wed, Sep 11, 2013 at 05:42:47PM +0200, Torvald Riegel wrote:
> > On Mon, 2013-09-02 at 09:59 +0200, Dominik Vogt wrote:
> > > With the old code, elision would never be disabled if every
> > > transaction aborts
> > > with a temporary failure.  This patch introduces a new configuration
> > > value that
> > > is used whenever pthread_mutex_lock runs out of retries.
> > 
> > I agree that this case can exist, but it's not quite as bad as you make
> > it sound like.  In particular, if the temporary / retryable aborts are
> > due to conflict with other threads acquiring the same lock, then in
> > practice we will disable, but via skip_lock_busy.
> 
> I do not think this is the case in general.  The code path that
> handles _ABORT_LOCK_BUSY is only executed in case of a race when
> one thread looks at *adapt_count and decides to try elision, but
> before it can start a transaction another thread using elision on
> the same lock is aborted and switches to using a real lock.  This
> needs to happen between
> 
>   if *adapt_count <= 0
> 
> and
> 
>   if (*futex == 0)
> 
> Otherwise the transaction will simply abort because of a
> read/write conflict on *futex, but it won't cause an
> _ABORT_LOCK_BUSY.

If no thread acquires the locks without elision, then there won't be any
conflicts on *futex.  There might be other conflicts on the same cache
line of course, but these are similar to ...

> > However, if we have
> > retryable aborts due to other threads' actions that don't acquire the
> > same lock, then you are right that we'll never disable.

... these aborts.

Note that if we're worried about livelocks, then it might be better to
try some kind of back-off instead of incrementing adapt_count because
the latter would probably lead to something similar to convoying: One
thread takes the slow path and acquires the lock without elision, which
leads subsequent threads observing the lock as busy, and forcing the
slow path as well.

The fact that the current adaptation algorithm does not change
adapt_count on conflicts could thus very well be actually better than
forcing the slow path as in what you proposed.  Can we get some
measurement data on this before considering this change?

> > Nonetheless, I'm not comfortable with this patch.  First, see the issues
> > below.
> 
> > Second, I think we need to take a step back and come up with a
> > full design and discussion of the adaptation first; right now we're
> > adding one hack after another, and this can't be good.
> 
> Yes, that _is_ important.  But please also keep in mind that my
> task at hand is to port lock elision to z, and thus my current
> focus on the discussion is to decide about the portions of code
> where z works differently than Tsx (e.g. handling of explicit
> abors).

Sure.  The explicit aborts might be different, altough that's still
under discussion I think.  But you haven't shown how the change you
suggest here is made necessary by any HTM differences.  Thus, I'd like
to see more discussion of it, and indication that it's the right
approach.

> > Dominik, it
> > would be great if you could work on this, and I'd like to see Andi to
> > help with that as much as possible.
> 
> Sure.
> 
> > > diff --git a/nptl/sysdeps/unix/sysv/linux/x86/elision-conf.c
> > > b/nptl/sysdeps/unix/sysv/linux/x86/elision-conf.c
> > > index 2fed32b..c6bd49f 100644
> > > --- a/nptl/sysdeps/unix/sysv/linux/x86/elision-conf.c
> > > +++ b/nptl/sysdeps/unix/sysv/linux/x86/elision-conf.c
> > > @@ -35,6 +35,9 @@ struct elision_config __elision_aconf =
> > >         to reasons other than other threads' memory accesses.
> > > Expressed in
> > >         number of lock acquisition attempts.  */
> > >      .skip_lock_internal_abort = 3,
> > > +    /* How often to not attempt to use elision if a lock used up all
> > > retries
> > > +       without success.  Expressed in number of lock acquisition
> > > attempts.  */
> > > +    .skip_lock_out_of_retries = 3,
> > 
> > IMHO, this needs another name.  What kind of retries?
> 
> It's not clear to me what you have in mind.  If you're fine with
> the name ".skip_lock_internal_abort";

I don't like this particular name, and it would be good if we had
something better :)  It's okay in the context of TSX terminology, but
would be good if we had something more descriptive, IMO.

> what additional information
> would you like to see in that name?  That it's about transaction
> retries, or what else?

For example, associate it with retry_try_xbegin.  Partially, I think the
problem here is that our adaptation is kind of fuzzy; if we had a crisp
understanding and description of how it works, a better name would
probably fall out.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]