This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 2/4] Add atomic operations similar to those provided by C11.

From: Torvald Riegel <triegel at redhat dot com>
To: "Joseph S. Myers" <joseph at codesourcery dot com>
Cc: GLIBC Devel <libc-alpha at sourceware dot org>
Date: Thu, 30 Oct 2014 21:38:30 +0100
Subject: Re: [PATCH 2/4] Add atomic operations similar to those provided by C11.
Authentication-results: sourceware.org; auth=none
References: <1414617613 dot 10085 dot 23 dot camel at triegel dot csb> <1414619416 dot 10085 dot 46 dot camel at triegel dot csb> <Pine dot LNX dot 4 dot 64 dot 1410292156440 dot 15119 at digraph dot polyomino dot org dot uk> <1414622734 dot 10085 dot 76 dot camel at triegel dot csb> <Pine dot LNX dot 4 dot 64 dot 1410292257040 dot 15119 at digraph dot polyomino dot org dot uk> <1414661232 dot 10085 dot 89 dot camel at triegel dot csb> <Pine dot LNX dot 4 dot 64 dot 1410301711530 dot 2316 at digraph dot polyomino dot org dot uk> <1414693238 dot 10085 dot 150 dot camel at triegel dot csb> <Pine dot LNX dot 4 dot 64 dot 1410302009301 dot 16421 at digraph dot polyomino dot org dot uk>

On Thu, 2014-10-30 at 20:12 +0000, Joseph S. Myers wrote:
> On Thu, 30 Oct 2014, Torvald Riegel wrote:
> 
> > > See for example elf/dl-lookup.c:do_lookup_x, using an asm to ensure reads 
> > > happen in a particular order:
> > > 
> > >   size_t n = scope->r_nlist;
> > >   /* Make sure we read the value before proceeding.  Otherwise we
> > >      might use r_list pointing to the initial scope and r_nlist being
> > >      the value after a resize.  That is the only path in dl-open.c not
> > >      protected by GSCOPE.  A read barrier here might be to expensive.  */
> > >   __asm volatile ("" : "+r" (n), "+m" (scope->r_list));
> > >   struct link_map **list = scope->r_list;
> > 
> > That ensures that the compiler won't reorder the reads -- but it does
> > *not* ensure that reads happen in a particular order when executed on an
> > arch with a weak memory model.  If we don't care about HW reordering, we
> > won't need the asm (unless perhaps we communicate with external stuff
> > through volatile, or do something behind the compiler's back), because
> > the language semantics for sequential code are taking care of that.
> > 
> > Thus, my guess is that this is a bug.
> 
> My guess is that on the system where a problem reordering was observed, it 
> was sufficient to stop the compiler from reordering the loads (but there 
> might be issues for some other system).  And this sort of code, where 
> there aren't other atomic operations nearby, is a case where we'd need to 
> be particularly careful about checking the performance impact of inserting 
> barriers.

But if there are issues for other archs, we need to insert a barrier
anyway, agreed?  If we want to write generic, non-arch-specific code,
this would be a generic barrier too -- which, in turn, would boil down
to a normal compiler barrier (ie, "memory").

We could think about adding specific versions of acquire barriers that
do something like the asm above in that the compiler barrier part just
applies to certain memory locations -- but I doubt that's really worth
it.  If we want to avoid the barrier, we can try other ways to do that.

This piece of code is also similar to the pthread_once fastpath, in that
it's an acquire barrier surrounded by loads.

> > > > Can we at least agree on having all glibc code use our own
> > > > atomic_store_relaxed / atomic_load_relaxed (see the patch)?  Then we can
> > > > still argue whether to use __atomic* to implement these.  But at least
> > > > we can easily switch to __atomic in the future.  Any objections?
> > > 
> > > I think explicit atomic loads / stores are fine, subject to checking code 
> > > generation before making them use __atomic_*.
> > 
> > How do you envision to do this?
> > 
> > Compare generated code on all archs before and after a change, for each
> > algorithm on its own?
> 
> Comparing code for one architecture, at the time of each conversion, seems 
> appropriate (with additional benchmarking for cases that don't otherwise 
> involve atomics).

How do we handle bug fixes?  When fixing a bug, do you want performance
parity too?

By "involve atomics", do you mean specifically atomic read-modify-write
ops, so atomic ops other than just loads and stores?

Follow-Ups:
- Re: [PATCH 2/4] Add atomic operations similar to those provided by C11.
  - From: Joseph S. Myers

References:
- [PATCH 0/4] Provide C11 atomic operations
  - From: Torvald Riegel
- [PATCH 2/4] Add atomic operations similar to those provided by C11.
  - From: Torvald Riegel
- Re: [PATCH 2/4] Add atomic operations similar to those provided by C11.
  - From: Joseph S. Myers
- Re: [PATCH 2/4] Add atomic operations similar to those provided by C11.
  - From: Torvald Riegel
- Re: [PATCH 2/4] Add atomic operations similar to those provided by C11.
  - From: Joseph S. Myers
- Re: [PATCH 2/4] Add atomic operations similar to those provided by C11.
  - From: Torvald Riegel
- Re: [PATCH 2/4] Add atomic operations similar to those provided by C11.
  - From: Joseph S. Myers
- Re: [PATCH 2/4] Add atomic operations similar to those provided by C11.
  - From: Torvald Riegel
- Re: [PATCH 2/4] Add atomic operations similar to those provided by C11.
  - From: Joseph S. Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]