On 11-02-2015 11:29, Leonhard Holz wrote:
I did get into the changes itself, but at least for powerpc (POWER8/16c/128T)
I am not seeing improvements with the patch. In fact it seems to increase
contention:
time per iteration
nths master patch
1 51.422 75.046
8 53.077 78.507
16 57.430 89.385
32 71.206 108.359
64 114.370 172.115
128 251.731 330.924
Thank you for testing! Maybe the costs of a mutex_lock are higher on PowerPC than on i686? Anyway it looks like I have to take a different approach...
PowerPC uses now the default implementation at sysdeps/nptl/lowlevellock.h which
basically translates to acquire CAS followed by a futex operation in contention
case. So I think the gain is for powerpc (specially with high SMT), busy-wait
using like a spinlock yields better performance than possible issuing a futex
operations.