This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Lock elision test results


On Wed, 2013-07-03 at 08:53 +0200, Dominik Vogt wrote: 
> On Tue, Jul 02, 2013 at 02:18:04PM +0200, Torvald Riegel wrote:
> > On Fri, 2013-06-14 at 12:26 +0200, Dominik Vogt wrote:
> > >   barrier
> > > 
> > > thread 2:
> > > 
> > >   barrier
> > >   get start timestamp
> > >   while thread 1 has not finished
> > >     lock m1
> > >     increment c3
> > >     unlock m2
> > >   get end timestamp
> > > 
> > > Performance is measured in loops of thread 2 divided by the time
> > > taken.
> > 
> > Why just thread 2's time?  Without looking at thread 1's performance too
> > we can't distinguish throughput from fairness effects.
> 
> As the threads are synchronized they take the same amount of time.

Right, I missed that.  Nonetheless, thread 2 can do more work depending
on whether it gets to acquire m1 more frequently than thread 1 or not.
If thread 2 does acquire m1 only infrequently for some reason, this will
decrease thread 1's time but it doesn't mean that we're now necessarily
faster -- we might just do less work overall, so less throughput.  Or if
thread 2 holds m1 most of the time (or, acquires it frequently), then
thread 1's time will be larger but we also do more work.

IOW, I think we need to report the iterations of thread 2 too, or let
thread2 do a fixed number of iterations and measure the time it needs
for those.

> Speaking about the actual test result:
> 
> 1. In (4), thread 1 has a high abort ratio, i.e. it needs lots of
>    additional cpu cycles and the whole test runs much longer than
>    it would without aborts.
> 2. Thread 2 does not benefit much from the additional run time,
>    the number of iterations is roughly the same as without
>    elision.

Does it have a high abort rate too?  If not, it should benefit from the
additional runtime that thread 1's effectively gives to thread 2.
What's the ratio between the increase in thread 2's iterations and the
increase in thread 1's iteration?

> 
> > > Test execution
> > > --------------
> > > 
> > > The test is run ten times each with four different versions and
> > > setups of glibc:
> > > 
> > > (1) current glibc without elision patchs (2506109403de)
> > > (2) glibc-2.15
> > > (3) current glibc (1) plus elision patchs, GLIBC_PTHREAD_MUTEX=none
> > > (4) current glibc (1) plus elision patchs, GLIBC_PTHREAD_MUTEX=elision
> > > 
> > > The best results of all runs for each glibc setup are compared.
> > 
> > What's the variance between the results from different test runs?  At
> > least having min/max/avg would be good, or perhaps the median, or
> > something like that.
> 
> It's a bit difficult to answer this without posting absolute
> numbers.  The range of measurements is large enough to be
> irritating in all test setups, even in the early morning when
> nobody but me works on the machine.

What do you mean by "irritating"?

> > >  and < 1% on thread 2.
> > 
> > Did thread 2 use elision often, or not?
> 
> !!!!!!!!!!!!!!!!!!!!!!!!!
> That's difficult to count without disturbing the test, but 
> 
> > If it did, you almost never
> > abort, and you just measure thread 2's performance, then why the
> > overhead?
> 
> > Are aborts extremely costly on your architecture?
> 
> Yes.

That's interesting.  Can you give a rough range of how costly they are?
Or what is happening internally?

If aborts are very costly, then this certainly affects any tuning
decisions regarding when to use elision or any kind of transaction-based
speculative execution.  We'd have to be more conservative.

Andi, do you have measurements of how costly aborts are on Haswell, or
can you give a rough estimation (i.e., not counting the work one wasted,
just the looking at the rollback and jump back to the xbegin)?  I'm
trying to understand the differences here, so that we can make a better
guess at default tuning parameters for different HTMs.


Torvald


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]