This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[PATCH 2/3] Mutex: Only read while spinning
- From: Kemi Wang <kemi dot wang at intel dot com>
- To: Glibc alpha <libc-alpha at sourceware dot org>
- Cc: Dave Hansen <dave dot hansen at linux dot intel dot com>, Tim Chen <tim dot c dot chen at intel dot com>, Andi Kleen <andi dot kleen at intel dot com>, Ying Huang <ying dot huang at intel dot com>, Aaron Lu <aaron dot lu at intel dot com>, Lu Aubrey <aubrey dot li at intel dot com>, Kemi Wang <kemi dot wang at intel dot com>
- Date: Fri, 30 Mar 2018 15:14:52 +0800
- Subject: [PATCH 2/3] Mutex: Only read while spinning
- References: <1522394093-9835-1-git-send-email-kemi.wang@intel.com>
The pthread adaptive spin mutex spins on the lock for a while before going
to a sleep. While the lock is contended and we need to wait, going straight
back to LLL_MUTEX_TRYLOCK(cmpxchg) is not a good idea on many targets as
that will force expensive memory synchronization among processors and
penalize other running threads. For example, it constantly floods the
system with "read for ownership" requests, which are much more expensive to
process than a single read. Thus, we only use MO read until we observe the
lock to not be acquired anymore, as suggusted by Andi Kleen.
Test machine:
2-sockets Skylake paltform, 112 cores with 62G RAM
Test case: Contended pthread adaptive spin mutex with global update
each thread of the workload does:
a) Lock the mutex (adaptive spin type)
b) Globle variable increment
c) Unlock the mutex
in a loop until timeout, and the main thread reports the total iteration
number of all the threads in one second.
This test case is as same as Will-it-scale.pthread_mutex3 except mutex type is
modified to PTHREAD_MUTEX_ADAPTIVE_NP.
github: https://github.com/antonblanchard/will-it-scale.git
nr_threads base head(SPIN_COUNT=10) head(SPIN_COUNT=1000)
1 51644585 51307573(-0.7%) 51323778(-0.6%)
2 7914789 10011301(+26.5%) 9867343(+24.7%)
7 1687620 4224135(+150.3%) 3430504(+103.3%)
14 1026555 3784957(+268.7%) 1843458(+79.6%)
28 962001 2886885(+200.1%) 681965(-29.1%)
56 883770 2740755(+210.1%) 364879(-58.7%)
112 1150589 2707089(+135.3%) 415261(-63.9%)
Suggested-by: Andi Kleen <andi.kleen@intel.com>
Signed-off-by: Kemi Wang <kemi.wang@intel.com>
---
nptl/pthread_mutex_lock.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c
index 1519c14..c3aca93 100644
--- a/nptl/pthread_mutex_lock.c
+++ b/nptl/pthread_mutex_lock.c
@@ -26,6 +26,7 @@
#include <atomic.h>
#include <lowlevellock.h>
#include <stap-probe.h>
+#include <mutex-conf.h>
#ifndef lll_lock_elision
#define lll_lock_elision(lock, try_lock, private) ({ \
@@ -124,16 +125,22 @@ __pthread_mutex_lock (pthread_mutex_t *mutex)
if (LLL_MUTEX_TRYLOCK (mutex) != 0)
{
int cnt = 0;
- int max_cnt = MIN (MAX_ADAPTIVE_COUNT,
- mutex->__data.__spins * 2 + 10);
+ int max_cnt = MIN (__mutex_aconf.spin_count,
+ mutex->__data.__spins * 2 + 100);
do
{
- if (cnt++ >= max_cnt)
- {
- LLL_MUTEX_LOCK (mutex);
- break;
- }
- atomic_spin_nop ();
+ if (cnt >= max_cnt)
+ {
+ LLL_MUTEX_LOCK (mutex);
+ break;
+ }
+ /* MO read while spinning */
+ do
+ {
+ atomic_spin_nop ();
+ }
+ while (atomic_load_relaxed (&mutex->__data.__lock) != 0 &&
+ ++cnt < max_cnt);
}
while (LLL_MUTEX_TRYLOCK (mutex) != 0);
--
2.7.4