This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Adding systemtap probe points in pthread library


Purpose 
======= 
The changes here are systemtap probe points that would enable systemtap understand pthread behavior -- esp. mutex and read/write lock & unlock operations. 

Performance Impact 
================== 
1) Running sysbench as the benchmark with MySQL linked against the glibc with the systemtap pthread probe points compiled in *but without* systemtap enabled, the performance & throughput of MySQL was same as the one linked against the vanilla glibc (ie. without the probe points compile in). 

2) With a micro benchmark that does nothing but just lock & unlock in a single thread, there was no notable difference in performance -- in fact, the glibc with the systemtap pthread probe points compiled in was slightly faster in some cases (less than 0.3% -- likely to be noise). 

3) Solaris 10 plockstat (basically a DTrace application) with -A (full lock events) enabled, MySQL/sysbench was about 2.2X slower. With the systemtap pthread probe points enabled glibc, MySQL/sysbench was a bit over 3X slower when all pthread_unlock operations were monitored. Note that this is not a good comparison, as "full lock events" in DTrace are not the same as all lock/unlock tracing in systemtap. The reason I bring this up is just to show that plockstat & this code are within same order of magnitude in terms of overhead. 

4) This code is based on an earlier implementation: 

http://sourceware.org/ml/systemtap/2009-q1/msg00502.html 

However, there are with some different design philosophies added in -- the major one being that I now have a probe point after the blocking operation. The reason is that with the probe point at the beginning of the pthread lock related functions, the users can find the locks that are causing most of the time consuming blocks by calculating: 

time spent in acquiring the lock = time @ successful lock acquisition - 
time @ function entry 

5) Finally, there are still performance improvement that can be done. However, I would like to get the glibc maintainers' opinion on how we should implement this: 

- currently, even if the lock is not contented, systemtap & the code will still need to trap into the kernel to record the event. The major place is in __pthread_mutex_lock() : 


if (__builtin_expect (type, PTHREAD_MUTEX_TIMED_NP) 
== PTHREAD_MUTEX_TIMED_NP) 
{ 
simple: 
/* Normal mutex. */ 
LLL_MUTEX_LOCK (mutex); 
assert (mutex->__data.__owner == 0); 

PTHREAD_PROBE_MUTEX_BLOCK(mutex); 
} 

Here, the inline assembly routine LLL_MUTEX_LOCK() is always executed, whether it blocks or not. So for an uncontented lock, each time a thread executes pthread_mutex_lock(), it traps into the kernel when systemtap tracing is enabled. 

The better solution is to trap into the kernel only when the thread needs to block (when it calls the futex system call). 

However, as it is an inline assembly language routine, I am basically touching all the architectures that is supported by glibc. I was wondering if the maintainers are OK with this change, as it will be touching all the architectures. 

Lastly, please provide feedback on this patch in general. 

Thanks, 
Ray 


diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h 
index 43ca44c..2f1a4c1 100644 
--- a/nptl/pthreadP.h 
+++ b/nptl/pthreadP.h 
@@ -33,6 +33,7 @@ 
#include <atomic.h> 
#include <kernel-features.h> 

+#include "pthread_probe.h" 

/* Atomic operations on TLS memory. */ 
#ifndef THREAD_ATOMIC_CMPXCHG_VAL 
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c 
index 34d83f9..66f44cb 100644 
--- a/nptl/pthread_create.c 
+++ b/nptl/pthread_create.c 
@@ -296,6 +296,11 @@ start_thread (void *arg) 
CANCEL_RESET (oldtype); 
} 

+ /* SystemTap probe 
+ All of the normal thread creation work would 
+ be done after this point */ 
+ PTHREAD_PROBE_START(pd->arg); 
+ 
/* Run the code the user provided. */ 
#ifdef CALL_THREAD_FCT 
THREAD_SETMEM (pd, result, CALL_THREAD_FCT (pd)); 
@@ -552,6 +557,9 @@ __pthread_create_2_1 (newthread, attr, start_routine, arg) 
/* Pass the descriptor to the caller. */ 
*newthread = (pthread_t) pd; 

+ /* Systemtap probe */ 
+ PTHREAD_PROBE_CREATE(newthread, start_routine, arg); 
+ 
/* Start the thread. */ 
return create_thread (pd, iattr, STACK_VARIABLES_ARGS); 
} 
diff --git a/nptl/pthread_join.c b/nptl/pthread_join.c 
index 6a87a8b..58171a3 100644 
--- a/nptl/pthread_join.c 
+++ b/nptl/pthread_join.c 
@@ -55,6 +55,8 @@ pthread_join (threadid, thread_return) 
struct pthread *self = THREAD_SELF; 
int result = 0; 

+ PTHREAD_PROBE_JOIN(threadid); 
+ 
/* During the wait we change to asynchronous cancellation. If we 
are canceled the thread we are waiting for must be marked as 
un-wait-ed for again. */ 
@@ -110,5 +112,7 @@ pthread_join (threadid, thread_return) 
__free_tcb (pd); 
} 

+ PTHREAD_PROBE_JOIN_RET(threadid, result); 
+ 
return result; 
} 
diff --git a/nptl/pthread_mutex_destroy.c b/nptl/pthread_mutex_destroy.c 
index e2c9f8a..2217f58 100644 
--- a/nptl/pthread_mutex_destroy.c 
+++ b/nptl/pthread_mutex_destroy.c 
@@ -29,6 +29,8 @@ __pthread_mutex_destroy (mutex) 
&& mutex->__data.__nusers != 0) 
return EBUSY; 

+ PTHREAD_PROBE_MUTEX_DESTROY(mutex); 
+ 
/* Set to an invalid value. */ 
mutex->__data.__kind = -1; 

diff --git a/nptl/pthread_mutex_init.c b/nptl/pthread_mutex_init.c 
index d9b1ef0..bf395dd 100644 
--- a/nptl/pthread_mutex_init.c 
+++ b/nptl/pthread_mutex_init.c 
@@ -45,6 +45,8 @@ __pthread_mutex_init (mutex, mutexattr) 

assert (sizeof (pthread_mutex_t) <= __SIZEOF_PTHREAD_MUTEX_T); 

+ PTHREAD_PROBE_MUTEX_INIT(mutex); 
+ 
imutexattr = (const struct pthread_mutexattr *) mutexattr ?: &default_attr; 

/* Sanity checks. */ 
diff --git a/nptl/pthread_mutex_lock.c b/nptl/pthread_mutex_lock.c 
index 50dc188..a4ccefe 100644 
--- a/nptl/pthread_mutex_lock.c 
+++ b/nptl/pthread_mutex_lock.c 
@@ -48,6 +48,10 @@ __pthread_mutex_lock (mutex) 
assert (sizeof (mutex->__size) >= sizeof (mutex->__data)); 

unsigned int type = PTHREAD_MUTEX_TYPE (mutex); 
+ 
+ /* systemtap marker */ 
+ PTHREAD_PROBE_MUTEX_ENTRY(mutex); 
+ 
if (__builtin_expect (type & ~PTHREAD_MUTEX_KIND_MASK_NP, 0)) 
return __pthread_mutex_lock_full (mutex); 

@@ -60,6 +64,8 @@ __pthread_mutex_lock (mutex) 
/* Normal mutex. */ 
LLL_MUTEX_LOCK (mutex); 
assert (mutex->__data.__owner == 0); 
+ 
+ PTHREAD_PROBE_MUTEX_BLOCK(mutex); 
} 
else if (__builtin_expect (type == PTHREAD_MUTEX_RECURSIVE_NP, 1)) 
{ 
@@ -75,6 +81,11 @@ __pthread_mutex_lock (mutex) 

++mutex->__data.__count; 

+ /* currently, the systemtap pthread probe does not have a */ 
+ /* probe point here because the thread already owns this */ 
+ /* recursive lock before the call to this function. */ 
+ /* this might change in the future */ 
+ 
return 0; 
} 

@@ -83,6 +94,8 @@ __pthread_mutex_lock (mutex) 

assert (mutex->__data.__owner == 0); 
mutex->__data.__count = 1; 
+ 
+ PTHREAD_PROBE_MUTEX_BLOCK(mutex); 
} 
else if (__builtin_expect (type == PTHREAD_MUTEX_ADAPTIVE_NP, 1)) 
{ 
@@ -94,6 +107,7 @@ __pthread_mutex_lock (mutex) 
int cnt = 0; 
int max_cnt = MIN (MAX_ADAPTIVE_COUNT, 
mutex->__data.__spins * 2 + 10); 
+ 
do 
{ 
if (cnt++ >= max_cnt) 
@@ -108,6 +122,8 @@ __pthread_mutex_lock (mutex) 
} 
while (LLL_MUTEX_TRYLOCK (mutex) != 0); 

+ PTHREAD_PROBE_MUTEX_BLOCK(mutex); 
+ 
mutex->__data.__spins += (cnt - mutex->__data.__spins) / 8; 
} 
assert (mutex->__data.__owner == 0); 
@@ -127,6 +143,8 @@ __pthread_mutex_lock (mutex) 
++mutex->__data.__nusers; 
#endif 

+ PTHREAD_PROBE_MUTEX_ACQUIRE(mutex); 
+ 
return 0; 
} 

@@ -277,6 +295,10 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex) 

++mutex->__data.__count; 

+ /* currently, the systemtap pthread probe does not have a */ 
+ /* probe point here because the thread already owns this */ 
+ /* recursive lock before the call to this function. */ 
+ /* this might change in the future */ 
return 0; 
} 
} 
@@ -393,6 +415,11 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex) 
/* Overflow of the counter. */ 
return EAGAIN; 

+ /* currently, the systemtap pthread probe does not have a */ 
+ /* probe point here because the thread already owns this */ 
+ /* recursive lock before the call to this function. */ 
+ /* this might change in the future */ 
+ 
++mutex->__data.__count; 

return 0; 
@@ -442,8 +469,10 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex) 
break; 

if (oldval != ceilval) 
+ { 
lll_futex_wait (&mutex->__data.__lock, ceilval | 2, 
PTHREAD_MUTEX_PSHARED (mutex)); 
+ } 
} 
while (atomic_compare_and_exchange_val_acq (&mutex->__data.__lock, 
ceilval | 2, ceilval) 
@@ -451,6 +480,8 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex) 
} 
while ((oldval & PTHREAD_MUTEX_PRIO_CEILING_MASK) != ceilval); 

+ PTHREAD_PROBE_MUTEX_BLOCK(mutex); 
+ 
assert (mutex->__data.__owner == 0); 
mutex->__data.__count = 1; 
} 
@@ -467,6 +498,8 @@ __pthread_mutex_lock_full (pthread_mutex_t *mutex) 
++mutex->__data.__nusers; 
#endif 

+ PTHREAD_PROBE_MUTEX_ACQUIRE(mutex); 
+ 
return 0; 
} 
#ifndef __pthread_mutex_lock 
diff --git a/nptl/pthread_mutex_unlock.c b/nptl/pthread_mutex_unlock.c 
index f9fe10b..50cbc5c 100644 
--- a/nptl/pthread_mutex_unlock.c 
+++ b/nptl/pthread_mutex_unlock.c 
@@ -50,6 +50,7 @@ __pthread_mutex_unlock_usercnt (mutex, decr) 

/* Unlock. */ 
lll_unlock (mutex->__data.__lock, PTHREAD_MUTEX_PSHARED (mutex)); 
+ PTHREAD_PROBE_MUTEX_RELEASE(mutex); 
return 0; 
} 
else if (__builtin_expect (type == PTHREAD_MUTEX_RECURSIVE_NP, 1)) 
@@ -60,6 +61,10 @@ __pthread_mutex_unlock_usercnt (mutex, decr) 

if (--mutex->__data.__count != 0) 
/* We still hold the mutex. */ 
+ 
+ /* currently, the systemtap pthread probe does not have */ 
+ /* probe point here because the thread still owns the lock */ 
+ /* this might change in the future */ 
return 0; 
goto normal; 
} 
@@ -104,6 +109,10 @@ __pthread_mutex_unlock_full (pthread_mutex_t *mutex, int decr) 

if (--mutex->__data.__count != 0) 
/* We still hold the mutex. */ 
+ 
+ /* currently, the systemtap pthread probe does not have */ 
+ /* probe point here because the thread still owns the lock */ 
+ /* this might change in the future */ 
return 0; 

goto robust; 
@@ -149,6 +158,10 @@ __pthread_mutex_unlock_full (pthread_mutex_t *mutex, int decr) 

if (--mutex->__data.__count != 0) 
/* We still hold the mutex. */ 
+ 
+ /* currently, the systemtap pthread probe does not have */ 
+ /* probe point here because the thread still owns the lock */ 
+ /* this might change in the future */ 
return 0; 
goto continue_pi_non_robust; 

@@ -171,6 +184,10 @@ __pthread_mutex_unlock_full (pthread_mutex_t *mutex, int decr) 

if (--mutex->__data.__count != 0) 
/* We still hold the mutex. */ 
+ 
+ /* currently, the systemtap pthread probe does not have */ 
+ /* probe point here because the thread still owns the lock */ 
+ /* this might change in the future */ 
return 0; 

goto continue_pi_robust; 
@@ -237,6 +254,10 @@ __pthread_mutex_unlock_full (pthread_mutex_t *mutex, int decr) 

if (--mutex->__data.__count != 0) 
/* We still hold the mutex. */ 
+ 
+ /* currently, the systemtap pthread probe does not have */ 
+ /* probe point here because the thread still owns the lock */ 
+ /* this might change in the future */ 
return 0; 
goto pp; 

@@ -272,6 +293,9 @@ __pthread_mutex_unlock_full (pthread_mutex_t *mutex, int decr) 
PTHREAD_MUTEX_PSHARED (mutex)); 

int oldprio = newval >> PTHREAD_MUTEX_PRIO_CEILING_SHIFT; 
+ 
+ PTHREAD_PROBE_MUTEX_RELEASE(mutex); 
+ 
return __pthread_tpp_change_priority (oldprio, -1); 

default: 
@@ -279,6 +303,7 @@ __pthread_mutex_unlock_full (pthread_mutex_t *mutex, int decr) 
return EINVAL; 
} 

+ PTHREAD_PROBE_MUTEX_RELEASE(mutex); 
return 0; 
} 

diff --git a/nptl/pthread_rwlock_destroy.c b/nptl/pthread_rwlock_destroy.c 
index 28fd24b..b4cd7ab 100644 
--- a/nptl/pthread_rwlock_destroy.c 
+++ b/nptl/pthread_rwlock_destroy.c 
@@ -24,6 +24,8 @@ int 
__pthread_rwlock_destroy (rwlock) 
pthread_rwlock_t *rwlock; 
{ 
+ PTHREAD_PROBE_RWLOCK_DESTROY(rwlock); 
+ 
/* Nothing to be done. For now. */ 
return 0; 
} 
diff --git a/nptl/pthread_rwlock_rdlock.c b/nptl/pthread_rwlock_rdlock.c 
index 31eb508..954b414 100644 
--- a/nptl/pthread_rwlock_rdlock.c 
+++ b/nptl/pthread_rwlock_rdlock.c 
@@ -31,6 +31,8 @@ __pthread_rwlock_rdlock (rwlock) 
{ 
int result = 0; 

+ PTHREAD_PROBE_RLOCK_ENTRY(rwlock); 
+ 
/* Make sure we are along. */ 
lll_lock (rwlock->__data.__lock, rwlock->__data.__shared); 

@@ -49,6 +51,12 @@ __pthread_rwlock_rdlock (rwlock) 
--rwlock->__data.__nr_readers; 
result = EAGAIN; 
} 
+ else 
+ { 
+ /* systemtap pthread probe - this is the only place where */ 
+ /* we get this read-write lock */ 
+ PTHREAD_PROBE_RWLOCK_ACQUIRE_READ(rwlock); 
+ } 

break; 
} 
diff --git a/nptl/pthread_rwlock_unlock.c b/nptl/pthread_rwlock_unlock.c 
index a7ef71a..e7d1568 100644 
--- a/nptl/pthread_rwlock_unlock.c 
+++ b/nptl/pthread_rwlock_unlock.c 
@@ -27,6 +27,8 @@ 
int 
__pthread_rwlock_unlock (pthread_rwlock_t *rwlock) 
{ 
+ PTHREAD_PROBE_RWLOCK_UNLOCK(rwlock); 
+ 
lll_lock (rwlock->__data.__lock, rwlock->__data.__shared); 
if (rwlock->__data.__writer) 
rwlock->__data.__writer = 0; 
diff --git a/nptl/pthread_rwlock_wrlock.c b/nptl/pthread_rwlock_wrlock.c 
index 64fe970..abf3083 100644 
--- a/nptl/pthread_rwlock_wrlock.c 
+++ b/nptl/pthread_rwlock_wrlock.c 
@@ -31,6 +31,8 @@ __pthread_rwlock_wrlock (rwlock) 
{ 
int result = 0; 

+ PTHREAD_PROBE_WLOCK_ENTRY(rwlock); 
+ 
/* Make sure we are along. */ 
lll_lock (rwlock->__data.__lock, rwlock->__data.__shared); 

@@ -41,6 +43,11 @@ __pthread_rwlock_wrlock (rwlock) 
{ 
/* Mark self as writer. */ 
rwlock->__data.__writer = THREAD_GETMEM (THREAD_SELF, tid); 
+ 
+ /* systemtap pthread probe - this is the only place where we can */ 
+ /* get this read-write lock. */ 
+ PTHREAD_PROBE_RWLOCK_ACQUIRE_WRITE(rwlock); 
+ 
break; 
} 

pthread_probe.h 

#include <sys/sdt.h> 

#define PTHREAD_PROBE_CREATE(arg1,arg2,arg3) STAP_PROBE3(provider,create,arg1,arg2,arg3) 
#define PTHREAD_PROBE_CREATE_ENABLED() 1 
#define PTHREAD_PROBE_JOIN(arg1) STAP_PROBE1(provider,join,arg1) 
#define PTHREAD_PROBE_JOIN_ENABLED() 1 
#define PTHREAD_PROBE_JOIN_RET(arg1,arg2) STAP_PROBE2(provider,join_ret,arg1,arg2) 
#define PTHREAD_PROBE_JOIN_RET_ENABLED() 1 
#define PTHREAD_PROBE_START(arg1) STAP_PROBE1(provider,start,arg1) 
#define PTHREAD_PROBE_START_ENABLED() 1 
#define PTHREAD_PROBE_END(arg1) STAP_PROBE1(provider,end,arg1) 
#define PTHREAD_PROBE_END_ENABLED() 1 
#define PTHREAD_PROBE_MUTEX_INIT(arg1) STAP_PROBE1(provider,mutex_init,arg1) 
#define PTHREAD_PROBE_MUTEX_INIT_ENABLED() 1 
#define PTHREAD_PROBE_MUTEX_DESTROY(arg1) STAP_PROBE1(provider,mutex_destroy,arg1) 
#define PTHREAD_PROBE_MUTEX_DESTROY_ENABLED() 1 
#define PTHREAD_PROBE_MUTEX_ACQUIRE(arg1) STAP_PROBE1(provider,mutex_acquire,arg1) 
#define PTHREAD_PROBE_MUTEX_ACQUIRE_ENABLED() 1 
#define PTHREAD_PROBE_MUTEX_RELEASE(arg1) STAP_PROBE1(provider,mutex_release,arg1) 
#define PTHREAD_PROBE_MUTEX_RELEASE_ENABLED() 1 
#define PTHREAD_PROBE_MUTEX_BLOCK(arg1) STAP_PROBE1(provider,mutex_block,arg1) 
#define PTHREAD_PROBE_MUTEX_BLOCK_ENABLED() 1 
#define PTHREAD_PROBE_COND_INIT(arg1) STAP_PROBE1(provider,cond_init,arg1) 
#define PTHREAD_PROBE_COND_INIT_ENABLED() 1 
#define PTHREAD_PROBE_COND_DESTROY(arg1) STAP_PROBE1(provider,cond_destroy,arg1) 
#define PTHREAD_PROBE_COND_DESTROY_ENABLED() 1 
#define PTHREAD_PROBE_COND_WAIT(arg1,arg2) STAP_PROBE2(provider,cond_wait,arg1,arg2) 
#define PTHREAD_PROBE_COND_WAIT_ENABLED() 1 
#define PTHREAD_PROBE_COND_WAKE(arg1,arg2) STAP_PROBE2(provider,cond_wake,arg1,arg2) 
#define PTHREAD_PROBE_COND_WAKE_ENABLED() 1 
#define PTHREAD_PROBE_COND_SIGNAL(arg1) STAP_PROBE1(provider,cond_signal,arg1) 
#define PTHREAD_PROBE_COND_SIGNAL_ENABLED() 1 
#define PTHREAD_PROBE_COND_BROADCAST(arg1) STAP_PROBE1(provider,cond_broadcast,arg1) 
#define PTHREAD_PROBE_COND_BROADCAST_ENABLED() 1 
#define PTHREAD_PROBE_RWLOCK_ACQUIRE_WRITE(arg1) STAP_PROBE1(provider,rwlock_acquire_write,arg1) 
#define PTHREAD_PROBE_RWLOCK_ACQUIRE_WRITE_ENABLED() 1 
#define PTHREAD_PROBE_RWLOCK_ACQUIRE_READ(arg1) STAP_PROBE1(provider,rwlock_acquire_read,arg1) 
#define PTHREAD_PROBE_RWLOCK_ACQUIRE_READ_ENABLED() 1 
#define PTHREAD_PROBE_RWLOCK_DESTROY(arg1) STAP_PROBE1(provider,rwlock_destroy,arg1) 
#define PTHREAD_PROBE_RWLOCK_DESTROY_ENABLED() 1 
#define PTHREAD_PROBE_RWLOCK_UNLOCK(arg1) STAP_PROBE1(provider,rwlock_unlock,arg1) 
#define PTHREAD_PROBE_RWLOCK_UNLOCK_ENABLED() 1 
#define PTHREAD_PROBE_MUTEX_ENTRY(arg1) STAP_PROBE1(provider,mutex_entry,arg1) 
#define PTHREAD_PROBE_MUTEX_ENTRY_ENABLED() 1 
#define PTHREAD_PROBE_RLOCK_ENTRY(arg1) STAP_PROBE1(provider,rlock_entry,arg1) 
#define PTHREAD_PROBE_RLOCK_ENTRY_ENABLED() 1 
#define PTHREAD_PROBE_RLOCK_BLOCK(arg1) STAP_PROBE1(provider,rlock_block,arg1) 
#define PTHREAD_PROBE_RLOCK_BLOCK_ENABLED() 1 
#define PTHREAD_PROBE_WLOCK_ENTRY(arg1) STAP_PROBE1(provider,wlock_entry,arg1) 
#define PTHREAD_PROBE_WLOCK_ENTRY_ENABLED() 1 
#define PTHREAD_PROBE_WLOCK_BLOCK(arg1) STAP_PROBE1(provider,wlock_block,arg1) 
#define PTHREAD_PROBE_WLOCK_BLOCK_ENABLED() 1 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]