This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PowerPC: libc single-thread lock optimization


On Fri, 2014-05-02 at 11:15 -0300, Adhemerval Zanella wrote:
> On 02-05-2014 11:04, Torvald Riegel wrote:
> > On Tue, 2014-04-29 at 15:05 -0300, Adhemerval Zanella wrote:
> >> On 29-04-2014 14:53, Torvald Riegel wrote:
> >>> On Tue, 2014-04-29 at 13:49 -0300, Adhemerval Zanella wrote:
> >>>> On 29-04-2014 13:22, Torvald Riegel wrote:
> >>>>> On Mon, 2014-04-28 at 19:33 -0300, Adhemerval Zanella wrote:
> >>>>>> I bring this about x86 because usually it is the reference implementation and sometimes puzzles
> >>>>>> me that copying the same idea in another platform raise architectural question.  But I concede
> >>>>>> that the reference itself maybe had not opted for best solution in first place.
> >>>>>>
> >>>>>> So if I have understand correctly, is the optimization to check for single-thread and opt to
> >>>>>> use locks is to focused on lowlevellock solely?  If so, how do you suggest to other archs to 
> >>>>>> mimic x86 optimization on atomic.h primitives?  Should other arch follow the x86_64 and
> >>>>>> check for __libc_multiple_threads value instead?  This could be a way, however it is redundant
> >>>>>> in mostly way: the TCP definition already contains the information required, so there it no
> >>>>>> need to keep track of it in another memory reference.  Also, following x86_64 idea, it check
> >>>>>> for TCB header information for sysdeps/CPU/bits/atomic.h, but for __libc_multiple_threads
> >>>>>> in lowlevellock.h.  Which is correct guideline for other archs?
> >>>>> >From a synchronization perspective, I think any single-thread
> >>>>> optimizations belong into the specific concurrent algorithms (e.g.,
> >>>>> mutexes, condvars, ...)
> >>>>> * Doing the optimization at the lowest level (ie, the atomics) might be
> >>>>> insufficient because if there's indeed just one thread, then lots of
> >>>>> synchronization code can be a lot more simpler than just avoiding
> >>>>> atomics (e.g., avoiding loops, checks, ...).
> >>>>> * The mutexes, condvars, etc. are what's exposed to the user, so the
> >>>>> assumptions of whether there really no concurrency or not just make
> >>>>> sense there.  For example, a single-thread program can still have a
> >>>>> process-shared condvar, so the condvar would need to use
> >>>>> synchronization.
> >>>> Follow x86_64 idea, this optimization is only for internal atomic usage for
> >>>> libc itself: for a process-shared condvar, one will use pthread code, which
> >>>> is *not* built with this optimization.
> >>> pthread code uses the same atomics we use for libc internally.
> >>> Currently, the x86_64 condvar, for example, doesn't use the atomics --
> >>> but this is what we'd need it do to if we ever want to use unified
> >>> implementations of condvars (e.g., like we did for pthread_once
> >>> recently).
> >> If you check my patch, the SINGLE_THREAD_P is defined as:
> >>
> >> #ifndef NOT_IN_libc
> >> # define SINGLE_THREAD_P \
> >>   (THREAD_GETMEM (THREAD_SELF, header.multiple_threads) == 0)
> >> #else
> >> # define SINGLE_THREAD_P   0
> >> #endif
> >>
> >> So for libpthread, the code path to use non atomic will be eliminated.  x86_64 is
> >> not that careful in some atomic primitives though.
> > I think that's not sufficient, nor are the low-level atomics the right
> > place for this kind of optimization.
> >
> > First, there are several source of concurrency affecting shared-memory
> > synchronization:
> > * Threads created by nptl.
> > * Other processes we're interacting with via shared memory.
> > * Reentrancy.
> > * The kernel, if we should synchronize with it via shared memory (e.g.,
> > recent perf does so, IIRC).
> >
> > We control the first.  The second case is, I suppose, only reachable by
> > using pthreads pshared sync operations (or not?).
> >
> > In case of reentrancy, there is concurrency between a signal handler and
> > a process consisting of a single thread, so we might want to use atomics
> > to synchronize.  I haven't checked whether we actually do (Alex might
> > know after doing the MT-Safety documentation) -- but I would not want us
> > preventing from using atomics for that, so a check on just
> > multiple_threads is not sufficient IMO.
> > Something similar applies to the kernel case.  Or if, in the future, we
> > should want to sync with any accelerators or similar.
> 
> As I stated previously, I have dropped to modify the atomic.h in favor of just
> the lowlevellock.h.  
> 
> And I think we need to reevaluate then the x86_64 code that does exactly what
> you think is wrong (add the single-thread opt on atomics). 

Note that those are different: They drop the "lock" prefix, but they are
not sequential code like what you add.
I agree that it's worth documenting them, but those should work in case
of reentrancy.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]