This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Unify pthread_once (bug 15215)


On Mon, 2014-03-31 at 12:44 +0100, Will Newton wrote:
> On 7 October 2013 22:53, Torvald Riegel <triegel@redhat.com> wrote:
> > On Mon, 2013-10-07 at 16:04 +0000, Joseph S. Myers wrote:
> >> I have no comments on the substance of this patch, but note that ports/
> >> has a separate ChangeLog file for each architecture.
> >
> > Sorry. The attached patch now has separate ChangeLog entries for each of
> > the affected archs.
> 
> There seems to be a significant performance delta on aarch64:
> 
> Old code:
> 
> "pthread_once": {
> "": {
> "duration": 9.29471e+09, "iterations": 1.10667e+09, "max": 24.54,
> "min": 8.38, "mean": 8.39882
> 
> New code:
> 
> "pthread_once": {
> "": {
> "duration": 9.72366e+09, "iterations": 4.33843e+08, "max": 30.86,
> "min": 22.38, "mean": 22.4128
> 
> And also ARM:
> 
> Old code:
> 
> "pthread_once": {
> "": {
> "duration": 8.38662e+09, "iterations": 6.6695e+08, "max": 35.292,
> "min": 12.416, "mean": 12.5746
> 
> New code:
> 
> "pthread_once": {
> "": {
> "duration": 9.26424e+09, "iterations": 3.07574e+08, "max": 86.125,
> "min": 28.875, "mean": 30.1204
> 
> It would be nice to understand the source of this variation. I can put
> it on my todo list but I can't promise I will be able to look at it
> any time soon.

The ARM code (or, the code in general) was lacking a memory barrier.
Here's what I wrote in the email that first sent the patch:

> > Both I1 and I2 were missing acquire MO on the very first load of
> > once_control.  This needs to synchronize with the release MO on setting
> > the state to init-finished, so without it it's not guaranteed to work
> > either.
> > Note that this will make a call to pthread_once that doesn't need to
> > actually run the init routine slightly slower due to the additional
> > acquire barrier.  If you're really concerned about this overhead, speak
> > up.  There are ways to avoid it, but it comes with additional complexity
> > and bookkeeping.

One way to try to work around the overhead is to keep thread-local state
that checks via a counter or such whether a particular thread already
used an acquire barrier on a load to this pthread_once previously.  This
will help only if the same pthread_once is called several times from the
same thread -- it won't help if a couple of threads all just call a
particular pthread_once a few times.
Also, because we can't keep thread-local state for each pthread_once,
we'd need to group them all -- in return, this will lead to some
synchronization between the initialization phases of unrelated
pthread_once instances.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]