This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/4] Add atomic operations similar to those provided by C11.


On Thu, 2014-10-30 at 20:12 +0000, Joseph S. Myers wrote:
> On Thu, 30 Oct 2014, Torvald Riegel wrote:
> 
> > > See for example elf/dl-lookup.c:do_lookup_x, using an asm to ensure reads 
> > > happen in a particular order:
> > > 
> > >   size_t n = scope->r_nlist;
> > >   /* Make sure we read the value before proceeding.  Otherwise we
> > >      might use r_list pointing to the initial scope and r_nlist being
> > >      the value after a resize.  That is the only path in dl-open.c not
> > >      protected by GSCOPE.  A read barrier here might be to expensive.  */
> > >   __asm volatile ("" : "+r" (n), "+m" (scope->r_list));
> > >   struct link_map **list = scope->r_list;
> > 
> > That ensures that the compiler won't reorder the reads -- but it does
> > *not* ensure that reads happen in a particular order when executed on an
> > arch with a weak memory model.  If we don't care about HW reordering, we
> > won't need the asm (unless perhaps we communicate with external stuff
> > through volatile, or do something behind the compiler's back), because
> > the language semantics for sequential code are taking care of that.
> > 
> > Thus, my guess is that this is a bug.
> 
> My guess is that on the system where a problem reordering was observed, it 
> was sufficient to stop the compiler from reordering the loads (but there 
> might be issues for some other system).  And this sort of code, where 
> there aren't other atomic operations nearby, is a case where we'd need to 
> be particularly careful about checking the performance impact of inserting 
> barriers.

But if there are issues for other archs, we need to insert a barrier
anyway, agreed?  If we want to write generic, non-arch-specific code,
this would be a generic barrier too -- which, in turn, would boil down
to a normal compiler barrier (ie, "memory").

We could think about adding specific versions of acquire barriers that
do something like the asm above in that the compiler barrier part just
applies to certain memory locations -- but I doubt that's really worth
it.  If we want to avoid the barrier, we can try other ways to do that.

This piece of code is also similar to the pthread_once fastpath, in that
it's an acquire barrier surrounded by loads.

> > > > Can we at least agree on having all glibc code use our own
> > > > atomic_store_relaxed / atomic_load_relaxed (see the patch)?  Then we can
> > > > still argue whether to use __atomic* to implement these.  But at least
> > > > we can easily switch to __atomic in the future.  Any objections?
> > > 
> > > I think explicit atomic loads / stores are fine, subject to checking code 
> > > generation before making them use __atomic_*.
> > 
> > How do you envision to do this?
> > 
> > Compare generated code on all archs before and after a change, for each
> > algorithm on its own?
> 
> Comparing code for one architecture, at the time of each conversion, seems 
> appropriate (with additional benchmarking for cases that don't otherwise 
> involve atomics).

How do we handle bug fixes?  When fixing a bug, do you want performance
parity too?

By "involve atomics", do you mean specifically atomic read-modify-write
ops, so atomic ops other than just loads and stores?



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]