This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Question about sparse arrays/atomic operations in systemtap


I have a question about systemtap. I have a script (attached) for
measuring mutex contention (not the futex trick, I am trying to measure
directly from pthread_mutex_* calls). Aside from the merits of the
script, it has a problem which seems like it would be quite common, I'm
wondering if there is a well known solution.

To summarize: I am keeping count of the number of threads that are
attempting to lock a mutex at any given time but there is a race:

probe process("/lib64/libpthread.so.0").function("pthread_mutex_lock") {
    if (++mutex_locks[$mutex] > 1) // Contended
       ...
probe process("/lib64/libpthread.so.0").function("pthread_mutex_unlock")
{
    // TODO: There is a race here:
    // - this thread does --mutex_locks[$mutex] and sees 0
    // - another thread hits the lock probe and does
++mutex_locks[$mutex]
    // - this thread does delete mutex_locks[$mutex] and nukes a non-0
value
    // Consequence: we could miss contentions. However we can't drop
    // the delete or we overrun the mutex_locks array.
    if (--mutex_locks[$mutex] <= 0)
      delete mutex_locks[$mutex];

I think what I want is an atomic (decrement and delete if 0) operation
on an array.

Another way to solve it (which might have more general use) would be to
have a new array type: a 'sparse array' which automatically deletes
values when they are set to the array default (0, "", empty statistic
etc) Such an array type would be very useful for arrays that are
potentially huge (e.g. an entry for every mutex in a process) but are
mostly 0.

Is there already a solution to this problem in systemtap, or would the
sparse array be a useful addition?

Cheers,
Alan.


//!/usr/bin/stap
//
// This script identifies contended mutex locks by a stack trace.
// Run with -x and -d for a particular process, e.g. for qpidd:
//
//     sudo stap ~/stap/mutexes.stp -x $(pgrep qpidd) -d $(which qpidd) --ldd -v
//
// While the script is running, it prints the address of each contended mutex
// on the first contended lock to show progress.
//
// When the script is killed (Ctrl-C or kill -int) it prints statistics for
// contended lock wait times and a stack trace to one of the contended lock locations
// for the most contended locks.
//
// Filter the output with c++filt to get readable C++ function names.
//
// NOTE: You'll need to install debuginfo for the pthread libraries, e.g. on RHEL/Fedora:
//     sudo debuginfo-install glibc
//
// TODO:
//
// The trace gives an address, function and library. Would be nice to
// convert the address into a file:line to precisely identify the
// lock statement. addr2line should be able to do this but haven't figured it
// out yet.
//
// Might also be interesting to do statistics by contended lock
// location rather than by mutex.
//

global thread_contended		  // time of latest contended lock call on thread.
global mutex_waits                // stats on contended wait times by mutex.
global mutex_locks		  // number of threads currently attempting to lock mutex.
global mutex_stacks		  // stack traces per mutex
global mutex_missed		  // possibly missed contention

probe process("/lib64/libpthread.so.0").function("pthread_mutex_lock") {
  if (target() == pid()) {
    if (++mutex_locks[$mutex] > 1) // Contended
	thread_contended[tid()] = gettimeofday_us();
  }
}

probe process("/lib64/libpthread.so.0").function("pthread_mutex_lock").return {
  // Note: this is lock.return so at this point the current thread is holding
  // the lock. No other thread can modify mutex_waits[$mutex] till this thread
  // calls unlock.
  if (target() == pid()) {
    contended = thread_contended[tid()];
    if (contended) {
      delete thread_contended[tid()]
      mutex_waits[$mutex] <<< (gettimeofday_us() - contended);
      if (@count(mutex_waits[$mutex]) == 1) {   // First contention of this mutex
	printf ("mutex %p: contended\n", $mutex)
	// We need to print the stack trace here, not in end() to get a useful trace.
	mutex_stacks[$mutex] = sprint_ubacktrace()
      }
    }
  }
}

probe process("/lib64/libpthread.so.0").function("pthread_mutex_unlock") {
  if (target() == pid()) {
    // TODO: There is a race here:
    // - this thread does --mutex_locks[$mutex] and sees 0
    // - another thread hits the lock probe and does ++mutex_locks[$mutex]
    // - this thread does delete mutex_locks[$mutex] and nukes a non-0 value
    // Consequence: we could miss contentions.
    // We detect if this happens (eventually we'll get count < 0) and report it.
    //
    count = --mutex_locks[$mutex]
    if (count <= 0)
      delete mutex_locks[$mutex];
    if (count < 0) {
      if (++mutex_missed[$mutex] == 1) {
	printf("mutex %p: invalid lock count\n", $mutex)
	mutex_stacks[$mutex] = sprint_ubacktrace()
      }
    }
  }
}

probe end {
  printf("\n\nContended mutexes, most frequent first\n\n")
  foreach ([m] in mutex_waits- limit 100) {
    printf ("==== mutex %p contended %d times, %dus total, %dus avg\n",
	    m,
	    @count(mutex_waits[m]),
	    @sum(mutex_waits[m]),
	    @avg(mutex_waits[m]))
    printf("%s\n\n", mutex_stacks[m])
  }
  printf("\n\nInvalid lock counts, possibly missed contention\n\n")
  foreach ([m] in mutex_missed- limit 100) {
    printf ("==== mutex %p: detected invalid lock count %d times\n", m, mutex_missed[m])
    printf("%s\n\n", mutex_stacks[m])
  }
}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]