This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Unify pthread_once (bug 15215)
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>, libc-alpha at sourceware dot org
- Date: Thu, 10 Apr 2014 22:34:42 +0200
- Subject: Re: [PATCH] Unify pthread_once (bug 15215)
- Authentication-results: sourceware.org; auth=none
- References: <1368024237 dot 7774 dot 794 dot camel at triegel dot csb> <519D97E4 dot 4030808 at redhat dot com> <1381018836 dot 8757 dot 3598 dot camel at triegel dot csb> <Pine dot LNX dot 4 dot 64 dot 1310071604090 dot 31470 at digraph dot polyomino dot org dot uk> <1381182784 dot 18547 dot 138 dot camel at triegel dot csb> <533605BF dot 9000005 at redhat dot com> <534307E1 dot 5050904 at linux dot vnet dot ibm dot com> <1397159860 dot 10643 dot 16419 dot camel at triegel dot csb>
On Thu, Apr 10, 2014 at 09:57:40PM +0200, Torvald Riegel wrote:
> On Mon, 2014-04-07 at 17:17 -0300, Adhemerval Zanella wrote:
> > On 28-03-2014 20:29, Carlos O'Donell wrote:
> > > David, Marcus, Joseph, Mike, Andreas, Steve, Chris,
> > >
> > > We would like to unify all C-based pthread_once implmentations
> > > per the plan in bug 15215 for glibc 2.20.
> > >
> > > Your machines are on the list of C-based pthread_once implementations.
> > >
> > > See this for the intial discussions on the unified pthread_once:
> > > https://sourceware.org/ml/libc-alpha/2013-05/msg00210.html
> > >
> > > The goal is to provide a single and correct C implementation of
> > > pthread_once. Architectures can then build on that if they need more
> > > optimal implementations, but I don't encourage that and I'd rather
> > > see deep discussions on how to make one unified solution where
> > > possible.
> > >
> > > I've also just reviewed Torvald's new pthread_once microbenchmark which
> > > you can use to compare your previous C implementation with the new
> > > standard C implementation (measures pthread_once latency). The primary
> > > use of this test is to help provide objective proof for or against the
> > > i386 and x86_64 assembly implementations.
> > >
> > > We are not presently converting any of the machines with custom
> > > implementations, but that will be a next step after testing with the
> > > help of the maintainers for sh, i386, x86_64, powerpc, s390 and alpha.
> > >
> > > If we don't hear any objections we will go forward with this change
> > > in one week and unify ia64, hppa, mips, tile, sparc, m68k, arm
> > > and aarch64 on a single pthread_once implementation based on sparc's C
> > > implementation.
> > >
> > > Any objections to this cleanup for 2.20?
> > >
> > I tested the patch and benchmark on PowerPC (POWER7) and the results looks good:
> >
> > * Current code:
> > "duration": 5.08322e+09, "iterations": 2.2037e+08, "max": 244.863, "min": 22.08, "mean": 23.0668
> > "duration": 5.08316e+09, "iterations": 2.20479e+08, "max": 771.08, "min": 21.984, "mean": 23.0551
> > "duration": 5.08306e+09, "iterations": 2.21093e+08, "max": 53.966, "min": 22.052, "mean": 22.9906
> > "duration": 5.0833e+09, "iterations": 2.20062e+08, "max": 347.895, "min": 22.15, "mean": 23.0994
> > "duration": 5.08277e+09, "iterations": 2.20699e+08, "max": 632.479, "min": 21.997, "mean": 23.0303
> >
> > * Optimization:
> > "duration": 4.97694e+09, "iterations": 8.42834e+08, "max": 134.181, "min": 5.405, "mean": 5.90501
> > "duration": 4.9758e+09, "iterations": 8.66952e+08, "max": 29.163, "min": 5.405, "mean": 5.73941
> > "duration": 4.9778e+09, "iterations": 8.51788e+08, "max": 40.819, "min": 5.405, "mean": 5.84394
> > "duration": 4.97413e+09, "iterations": 8.52432e+08, "max": 37.089, "min": 5.488, "mean": 5.83523
> > "duration": 4.97795e+09, "iterations": 8.43376e+08, "max": 163.703, "min": 5.405, "mean": 5.90241
> >
> > I am running on a 18 cores machine, so I guess the 'max' is due a timing issue from os jitter.
>
> There's no spinning in the algorithm currently (or most of glibc AFAIK,
> except the rather simplistic attempt in adaptive mutexes), so even small
> initialization routines may go straight to blocking via futexes. (And
> AFAIK, futexes don't spin before actually trying to block.)
>
In pthread_once context spinning is unlikely to help, you would need to
hit oppurtinity window when other thread does brief initialization and
this could happen only once each core.