This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 2/2] PowerPC: libc single-thread lock optimization


On Fri, 2014-05-02 at 14:40 -0300, Adhemerval Zanella wrote:
> On 02-05-2014 12:42, Torvald Riegel wrote:
> > On Wed, 2014-04-30 at 10:57 -0300, Adhemerval Zanella wrote:
> >> +#else
> >> +# define __lll_is_single_thread (0)
> >> +#endif
> >> +
> >> +
> >>  #define lll_futex_wait(futexp, val, private) \
> >>    lll_futex_timed_wait (futexp, val, NULL, private)
> >>  
> >> @@ -205,7 +215,9 @@
> >>  /* Set *futex to ID if it is 0, atomically.  Returns the old value */
> >>  #define __lll_robust_trylock(futex, id) \
> >>    ({ int __val;                                                                      \
> >> -     __asm __volatile ("1:     lwarx   %0,0,%2" MUTEX_HINT_ACQ "\n"          \
> >> +     if (!__lll_is_single_thread)                                            \
> >> +       __asm __volatile (                                                    \
> >> +                      "1:      lwarx   %0,0,%2" MUTEX_HINT_ACQ "\n"          \
> >>                        "        cmpwi   0,%0,0\n"                             \
> >>                        "        bne     2f\n"                                 \
> >>                        "        stwcx.  %3,0,%2\n"                            \
> >> @@ -214,6 +226,12 @@
> >>                        : "=&r" (__val), "=m" (*futex)                         \
> >>                        : "r" (futex), "r" (id), "m" (*futex)                  \
> >>                        : "cr0", "memory");                                    \
> >> +     else                                                                    \
> >> +       {                                                                     \
> >> +        __val = *futex;                                                      \
> >> +        if (__val == 0)                                                      \
> >> +          *futex = id;                                                       \
> >> +       }                                                                     \
> >>       __val;                                                                  \
> >>    })
> > Conceptually, you can safely use trylock in signal handlers, because
> > it's not a blocking operation.  And this is used for normal and robust
> > locks.  I haven't checked all the trylock code and all uses to see
> > whether it is indeed AS-Safe, but I'm pretty sure this change is wrong.
> > At the very least, it doesn't document that trylock is now AS-Unsafe.
> 
> If there is the case I concede to you, however I couldn't find any proper
> documentation (either POSIX or from our manual) saying trylocks should be
> AS-Safe.

We haven't documented it AFAIK, but these are our internal locks, and
unlike lock() which blocks, trylock() won't block.  So, trylock's
semantics can be useful in signal handler contexts, and your are
removing this possibility.  At the very least, if we have no current
internal users of trylock in AS-Safe functions, then we could consider
this change.  But I think this needs wider consensus in the community.

Regarding POSIX, I'm not aware of any explicit mentioning of what's safe
in signal handlers.  However, trylock is synchronizing, and documented
as non-blocking, so one could consider it being AS-Safe.

> And answering your previous question, I don't know if anyone is
> tracking the remaining documentation about pthread.

Nor do I really, but if you make a change to the semantics, it's worth
documenting it even if there's no existing documentation.

> >
> > Finally, for a change as this whose effect on performance is
> > non-obvious, I think we really need performance measurements.  We need
> > to have microbenchmarks so that we can track the change of performance.
> >
> Although micro-benchmarks can provided (and they will prob be in that loop)
> 'this is not a real case'),

What do you mean?  Will you provide microbenchmarks later on, just some
for the atomics, or some for a specific use of the modified locks (e.g.,
malloc)?

> for PowerPC avoid LL/SC instructions where they
> are not required is always a gain.

I think we had consensus in the community that we want to track
performance, and base performance-focused changes on data that we can
reassess later on (e.g., for a future PowerPC generation).  If we want
to do that, then I do think this needs microbenchmarks.  Two options
come to mind:
* Plain RMWs or CASs to cached and uncached memory.  The best case for
the optimization, I suppose.
* Real use of the optimization in, say, malloc.  That's a more realistic
benchmark.

If the second doesn't show any significant performance differences
(e.g., because there's so much other stuff going on that this doesn't
matter), then maybe it's not worth it to maintain it.  We can only say
with some confidence what's the case after having numbers.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]