This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/6] Fall back to locking if elision runs out of retries.


On Wed, 2013-09-18 at 09:43 +0200, Dominik Vogt wrote:
> On Mon, Sep 16, 2013 at 07:24:23PM +0200, Torvald Riegel wrote:
> > On Mon, 2013-09-16 at 14:39 +0200, Dominik Vogt wrote:
> > > On Wed, Sep 11, 2013 at 05:42:47PM +0200, Torvald Riegel wrote:
> > > > On Mon, 2013-09-02 at 09:59 +0200, Dominik Vogt wrote:
> > > > > With the old code, elision would never be disabled if every
> > > > > transaction aborts
> > > > > with a temporary failure.  This patch introduces a new configuration
> > > > > value that
> > > > > is used whenever pthread_mutex_lock runs out of retries.
> > > > 
> > > > I agree that this case can exist, but it's not quite as bad as you make
> > > > it sound like.  In particular, if the temporary / retryable aborts are
> > > > due to conflict with other threads acquiring the same lock, then in
> > > > practice we will disable, but via skip_lock_busy.
> > > 
> > > I do not think this is the case in general.  The code path that
> > > handles _ABORT_LOCK_BUSY is only executed in case of a race when
> > > one thread looks at *adapt_count and decides to try elision, but
> > > before it can start a transaction another thread using elision on
> > > the same lock is aborted and switches to using a real lock.  This
> > > needs to happen between
> > > 
> > >   if *adapt_count <= 0
> > > 
> > > and
> > > 
> > >   if (*futex == 0)
> > > 
> > > Otherwise the transaction will simply abort because of a
> > > read/write conflict on *futex, but it won't cause an
> > > _ABORT_LOCK_BUSY.
> > 
> > If no thread acquires the locks without elision, then there won't be any
> > conflicts on *futex.
> 
> Yes, that's exactly what I described above.  The other thread
> needs to switch from elision to using the lock while the current
> thread is busy between the two lines quoted above.

I'm not quite sure which situation you're talking about.  In any case,
we'll see ABORT_LOCK_BUSY not just in the interleaving you describe
above, but also when a lock is held but contended, and thus some (other)
threads (have) decrease(d) adapt_count.

> >  There might be other conflicts on the same cache
> > line of course, but these are similar to ...
> > 
> > > > However, if we have
> > > > retryable aborts due to other threads' actions that don't acquire the
> > > > same lock, then you are right that we'll never disable.
> > 
> > ... these aborts.
> > 
> > Note that if we're worried about livelocks,
> 
> That's not my worry here.
> 
> > then it might be better to
> > try some kind of back-off instead of incrementing adapt_count because
> > the latter would probably lead to something similar to convoying: One
> > thread takes the slow path and acquires the lock without elision, which
> > leads subsequent threads observing the lock as busy, and forcing the
> > slow path as well.
> 
> There is certainly a danger that locks with high contention never
> get out of non-elision mode because once threads start using locks
> there may always be a thread that uses a lock and forces all other
> threads to wait on the lock too.  In this situation, the elision
> code cannot speed up the program, but it won't slow down the
> program significantly either.
> 
> On the other hand, in a program with a highly contended mutex,
> all aborts may be retryable.  The current code does not adapt to
> that situation, and the software may run significantly slower
> than necessary because it runs frequently into into the configured
> number of retries which all abort because of high contention in
> the data touched by the transaction body or the *futex.
> 
> > The fact that the current adaptation algorithm does not change
> > adapt_count on conflicts could thus very well be actually better than
> > forcing the slow path as in what you proposed.
> 
> > Can we get some
> > measurement data on this before considering this change?
> 
> What do you have in mind?  It's fairly easy to write a test case
> that shows abysmal performance with the existing code, or a test
> case that performs nicely with the current code but does not get
> a speed boost with the proposed patch.

Right.  Thus, we don't want to have measurements of microbenchmarks
designed to trigger this or that particular case, but instead look at
performance of real-world applications, for example.  Any application
should do for a start.  At the current stage, I think it's rather likely
that we can hit pathological cases anyway, but so far I think we have
very little data on performance in workloads that might represent the
common case.

> > Sure.  The explicit aborts might be different, altough that's still
> > under discussion I think.  But you haven't shown how the change you
> > suggest here is made necessary by any HTM differences.  Thus, I'd like
> > to see more discussion of it, and indication that it's the right
> > approach.
> 
> Maybe the change makes more sense to you if you consider that
> aborts on zEC12 are very expensive?

Are just explicit aborts very expensive, or any abort of a transaction
in general too (e.g., aborts due to conflicts with other threads)?

> I foresee that three useless
> attempts are a big performance problem on the zEC12; the current
> Intel logic is out of the question for z.

Are you saying that z needs to be much more cautious/conservative in
when it tries to use elision, or that just an explicit abort is a
problem?  If the latter, is there another way to abort the transaction
that is less expensive?  We do need to abort because we can't continue
(eg, when the lock is acquired, or on trylock), but in case of a
acquired lock we could rely on eventually getting a conflict when the
lock is released (IOW, spin on the lock inside of the transaction).

If spinning would help, we could add that to the general adaptation
algorithm and control it via a flag in elision_aconf (possibly a const
member of the struct).

> And even if aborted
> transactions on Haswell perform better on Haswell, I see that
> problem there too.

Are you referring to the time wasted on aborting after conflicts on the
futex itself that aren't detected as ABORT_LOCK_BUSY?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]