This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Spin barrier


Not sure if this is the right place for this. I wrote a spin barrier
for x86_64 based on the NPTL code in
nptl/sysdeps/x86_64/pthread_spin_lock.c. Since I started from this
file I'm guessing my code is irrevocably LGPL, which means I can put
it here for anyone else needs this. Maybe something like it is worth
having as another primitive in NPTL.

Why would you need a spin barrier? Try making Gantt charts of any kind
of tightly synchronized multi-core app based on pthread_barrier or (or
DIY barrier with pthread_mutex) under Linux. At present the thread
yield generated gives terrible performance problems. The OS is
reluctant to bring your threads back in short order, and you can find
you lose more to scheduling inertia than you gain from (theoretical)
parallelism. Obviously forcing everything to sit in a spin is less
than ideal on so many levels, but for certain applications there is no
alternative until the scheduler offers better control, or has less
pathological behaviours.

Anyway code (since it's so short):

/* sense reversing barrier; crummey91 */
void barrier_wait(barrier_t *b, int index) {
  __asm __volatile
    ("\n\t"
     "movl %2,%%ecx\n\t"
     "notl %%ecx\n\t"
     "movl %%ecx, %2\n\t"
     "lock; decl %0\n\t"
     "jne 1f\n\t"
     "lock; notl %1\n\t"
     "lock; addl %6,%0\n"
     "2:\t rep; nop\n\t"
     ".subsection 1\n\t"
     ".align 16\n"
     "1:\trep; nop\n\t"
     "cmpl %%ecx, %1\n\t"
     "jne 1b\n\t"
     "jmp 2b\n\t"
     ".previous"
     : "=m" (b->count),
       "=m" (b->gsense),
       "=m" (b->sense[index])
     : "m"  (b->count),
       "m"  (b->gsense),
       "m"  (b->sense[index]),
       "r"  (b->threads)
     : "%ecx" );
  return;
}

It's predicated on this datastructure:

typedef struct barrier_s {
  int count;
  int threads;
  int *sense;
  int gsense;
} barrier_t;

which should be initialised with:
- sense is an array of ints, all initially 0, one per thread to sync
- gsense = 0
- count = threads = number of threads that will sync here

It's based on algorithm 7 (sense reversing barrier) of

John M. Mellor-Crummey and Michael L. Scott. Algorithms for scalable
synchronization on shared-memory multiprocessors. ACM Transactions on
Computer Systems, 9(1):21-65, February 1991

gsense is the sense the current phase has. sense[index] is the phase
each thread is in. count is the number of threads outstanding in this
phase. On entering the barrier each thread negates its own sense, and
decrements the counter. If it zeroed the counter, it negates the
global sense, restores the counter to the number of threads, and
returns. Otherwise it goes into a spin waiting for the global sense to
match its own. As soon as it does, it also returns.

This isn't the most efficient algorithm in terms of memory conflicts
and so forth, but it's probably pretty good for the low numbers of
processors in current machines. Doing something fancier with trees (as
that paper goes on to develop) isn't worth it at this stage I suspect.

I don't think there are any correctness issues here; I might be
overcautious with the lock prefixes (there are 3) but I think they're
all necessary. If you spot correctness failings you're doing me a huge
favour by pointing them out of course :) I have stressed this fairly
heavily in my own applications.

Ed


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]