This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Future atomic operation cleanup

From: Torvald Riegel <triegel at redhat dot com>
To: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>
Cc: Richard Henderson <rth at twiddle dot net>, "GNU C. Library" <libc-alpha at sourceware dot org>
Date: Thu, 28 Aug 2014 17:37:46 +0200
Subject: Re: Future atomic operation cleanup
Authentication-results: sourceware.org; auth=none
References: <53F74A93 dot 30508 at linux dot vnet dot ibm dot com> <53F74CE4 dot 5070809 at twiddle dot net> <53FF3551 dot 8020503 at linux dot vnet dot ibm dot com>

On Thu, 2014-08-28 at 10:57 -0300, Adhemerval Zanella wrote:
> On 22-08-2014 11:00, Richard Henderson wrote:
> > On 08/22/2014 06:50 AM, Adhemerval Zanella wrote:
> >> Hi,
> >>
> >> Following comments from my first patch to optimize single-thread internal
> >> glibc locking/atomics [1], I have changed the implementation to use now
> >> relaxed atomics instead.  Addresing the concerns raised in last discussion, 
> >> the primitives are still signal-safe (although not thread-safe), so if future
> >> malloc implementation is changed to be async-safe, it won't require to a
> >> adjust powerpc atomics.
> >>
> >> For catomic_and and catomic_or I follow the definition at 'include/atomic.h'
> >> (which powerpc is currently using) and implemented the atomics with acquire
> >> semantics.  The new implementation also is simpler.
> >>
> >> On synthetic benchmarks it shows an improvement of 5-10% for malloc
> >> calls and an performance increase of 7-8% in 483.xalancbmk from
> >> speccpu2006 (number from an POWER8 machine).
> >>
> >> Checked on powerpc64, powerpc32 and powerpc64le.
> > Wow, that's a lot of boiler-plate.
> >
> > When development opens again, can we simplify all of these atomic operations by
> > assuming compiler primitives?  That is, use the __atomic builtins for gcc 4.8
> > and later, fall back to the __sync builtins for earlier gcc, and completely
> > drop support for truly ancient compilers that support neither.
> >
> > As a bonus we'd get to unify the implementations across all of the targets.
> >
> >
> > r~
> >
> I also agree we should move to more a unified implementation (in fact, I
> plan to get rid of powerpc lowlevellock.h when devel opens again).  However
> I really don't want to either wait or reimplement all the custom atomic to push
> this optimization... 

I believe that, unless the caveat Joseph mentioned actually applies to
the archs you're concerned about, using the compiler builtins and doing
the unification would simplify your patch considerably.  It would also
avoid having to iterate over the changes of your patch again once we do
all of the unification.

> I think such change will require a lot of iteration and testing, which is not
> the intend of this patch.

If we move to C11-like atomics, then those will certainly go in
incrementally, and exist in parallel with the current atomics for a
while (until we reviewed all existing code using the old atomics and
moved it over to the new atomics).

If we use the compiler builtins, we'd also have to test less because we
have no additional custom atomics implementation we need to maintain.

Follow-Ups:
- Re: Future atomic operation cleanup
  - From: Adhemerval Zanella

References:
- [PATCH v2] PowerPC: libc single-thread lock optimization
  - From: Adhemerval Zanella
- Future atomic operation cleanup
  - From: Richard Henderson
- Re: Future atomic operation cleanup
  - From: Adhemerval Zanella

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]