This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Unify pthread_once (bug 15215)


On Thu, 2014-04-10 at 22:34 +0200, OndÅej BÃlka wrote:
> On Thu, Apr 10, 2014 at 09:57:40PM +0200, Torvald Riegel wrote:
> > On Mon, 2014-04-07 at 17:17 -0300, Adhemerval Zanella wrote:
> > > On 28-03-2014 20:29, Carlos O'Donell wrote:
> > > > David, Marcus, Joseph, Mike, Andreas, Steve, Chris,
> > > >
> > > > We would like to unify all C-based pthread_once implmentations
> > > > per the plan in bug 15215 for glibc 2.20.
> > > >
> > > > Your machines are on the list of C-based pthread_once implementations.
> > > >
> > > > See this for the intial discussions on the unified pthread_once:
> > > > https://sourceware.org/ml/libc-alpha/2013-05/msg00210.html
> > > >
> > > > The goal is to provide a single and correct C implementation of 
> > > > pthread_once. Architectures can then build on that if they need more 
> > > > optimal implementations, but I don't encourage that and I'd rather
> > > > see deep discussions on how to make one unified solution where
> > > > possible.
> > > >
> > > > I've also just reviewed Torvald's new pthread_once microbenchmark which
> > > > you can use to compare your previous C implementation with the new
> > > > standard C implementation (measures pthread_once latency). The primary
> > > > use of this test is to help provide objective proof for or against the
> > > > i386 and x86_64 assembly implementations.
> > > >
> > > > We are not presently converting any of the machines with custom
> > > > implementations, but that will be a next step after testing with the
> > > > help of the maintainers for sh, i386, x86_64, powerpc, s390 and alpha.
> > > >
> > > > If we don't hear any objections we will go forward with this change
> > > > in one week and unify ia64, hppa, mips, tile, sparc, m68k, arm
> > > > and aarch64 on a single pthread_once implementation based on sparc's C
> > > > implementation.
> > > >
> > > > Any objections to this cleanup for 2.20?
> > > >
> > > I tested the patch and benchmark on PowerPC (POWER7) and the results looks good:
> > > 
> > > * Current code:
> > > "duration": 5.08322e+09, "iterations": 2.2037e+08, "max": 244.863, "min": 22.08, "mean": 23.0668
> > > "duration": 5.08316e+09, "iterations": 2.20479e+08, "max": 771.08, "min": 21.984, "mean": 23.0551
> > > "duration": 5.08306e+09, "iterations": 2.21093e+08, "max": 53.966, "min": 22.052, "mean": 22.9906
> > > "duration": 5.0833e+09, "iterations": 2.20062e+08, "max": 347.895, "min": 22.15, "mean": 23.0994
> > > "duration": 5.08277e+09, "iterations": 2.20699e+08, "max": 632.479, "min": 21.997, "mean": 23.0303
> > > 
> > > * Optimization:
> > > "duration": 4.97694e+09, "iterations": 8.42834e+08, "max": 134.181, "min": 5.405, "mean": 5.90501
> > > "duration": 4.9758e+09, "iterations": 8.66952e+08, "max": 29.163, "min": 5.405, "mean": 5.73941
> > > "duration": 4.9778e+09, "iterations": 8.51788e+08, "max": 40.819, "min": 5.405, "mean": 5.84394
> > > "duration": 4.97413e+09, "iterations": 8.52432e+08, "max": 37.089, "min": 5.488, "mean": 5.83523
> > > "duration": 4.97795e+09, "iterations": 8.43376e+08, "max": 163.703, "min": 5.405, "mean": 5.90241
> > > 
> > > I am running on a 18 cores machine, so I guess the 'max' is due a timing issue from os jitter.
> > 
> > There's no spinning in the algorithm currently (or most of glibc AFAIK,
> > except the rather simplistic attempt in adaptive mutexes), so even small
> > initialization routines may go straight to blocking via futexes.  (And
> > AFAIK, futexes don't spin before actually trying to block.)
> > 
> In pthread_once context spinning is unlikely to help, you would need to
> hit oppurtinity window when other thread does brief initialization and
> this could happen only once each core.

Yes, this would be for brief initialization (similar as when doing it
for brief critical sections).  And it won't help except when
pthread_once is contended.  However, it won't cost anything in all other
cases (except the usual spinning vs. blocking trade-off one has to
make), so there's also not a lot to loose here by adding spinning.

The example use case that comes to mind would be threads using
pthread_once to decide on who among them is first doing something, but
then just using this value once and not calling pthread_once again and
again.  E.g., for something like electing a leader or the like.  That
seems like a reasonable use, but I *guess* a less likely one.



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]