This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

PowerPC: Sync pthread_once with default implementation


This patch removes the arch specific powerpc implementation and instead
uses the linux default one.  Although the current powerpc implementation
already contains the required memory barriers for correct
initialization, the default implementation shows a better performance on
newer chips.

I checked the default implementations with some other tests for powerpc
and everything looks ok.  The only nit that was puzzling me was code
difference I saw checking if a load acquire instruction sequence would
yield better performance than the relaxed load plus acquire memory fence.
>From Torvald Riegel comment:

> Okay.  If lwsync is preferable for an acquire load, it might be best to
> check the GCC atomics implementation because I believe it's following
> http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html, which suggests
> isync for acquire loads.

I rechecked ISA documentation and some literature and the only inhibitor
about using an 'lwsync' as an acquire barrier is because it should not
be used on 'Write Through Required and Caching Inhibited' storage location,
which AFAIK are used only in restricted kernel areas.  In fact, load/store
with reservations does not work on such kind of memory attributes,  so for
general userland lock mechanism 'lwsync' is safe to be used as acquire
memory fence.

GCC it self translate C11 atomic_thread_fence (memory_order_acquire) to
'lwsync', so a relaxed load plus memory acquire fence will be translated
to 'ld; lwsync'.  Two advantage by using 'ld; cmp; bc; isync' for
acquire loads is 1) it is more strict, thus safer (although still not
really required) 2) is it performs much better some chips (POWER6), while
showing very similar performance on other POWER platforms.

I will see if it is worth to change the current core for acquire load/
release store, at least for POWER.  The only chip that I saw a big improvement
of doing it is on POWER6.

Also, this testcase itself is single-thread and think it would be worth to
extend it to check contention on multithread cases as well.

--

* nptl/sysdeps/unix/sysv/linux/powerpc/pthread_once.c: Remove file.

---

diff --git a/nptl/sysdeps/unix/sysv/linux/powerpc/pthread_once.c b/nptl/sysdeps/unix/sysv/linux/powerpc/pthread_once.c
deleted file mode 100644
index e925299..0000000
--- a/nptl/sysdeps/unix/sysv/linux/powerpc/pthread_once.c
+++ /dev/null
@@ -1,110 +0,0 @@
-/* Copyright (C) 2003-2014 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-   Contributed by Paul Mackerras <paulus@au.ibm.com>, 2003.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	 See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, see
-   <http://www.gnu.org/licenses/>.  */
-
-#include "pthreadP.h"
-#include <lowlevellock.h>
-
-
-unsigned long int __fork_generation attribute_hidden;
-
-
-static void
-clear_once_control (void *arg)
-{
-  pthread_once_t *once_control = (pthread_once_t *) arg;
-
-  __asm __volatile (__lll_rel_instr);
-  *once_control = 0;
-  lll_futex_wake (once_control, INT_MAX, LLL_PRIVATE);
-}
-
-
-int
-__pthread_once (pthread_once_t *once_control, void (*init_routine) (void))
-{
-  for (;;)
-    {
-      int oldval;
-      int newval;
-      int tmp;
-
-      /* Pseudo code:
-	 newval = __fork_generation | 1;
-	 oldval = *once_control;
-	 if ((oldval & 2) == 0)
-	   *once_control = newval;
-	 Do this atomically with an acquire barrier.
-      */
-      newval = __fork_generation | 1;
-      __asm __volatile ("1:	lwarx	%0,0,%3" MUTEX_HINT_ACQ "\n"
-			"	andi.	%1,%0,2\n"
-			"	bne	2f\n"
-			"	stwcx.	%4,0,%3\n"
-			"	bne	1b\n"
-			"2:	" __lll_acq_instr
-			: "=&r" (oldval), "=&r" (tmp), "=m" (*once_control)
-			: "r" (once_control), "r" (newval), "m" (*once_control)
-			: "cr0");
-
-      /* Check if the initializer has already been done.  */
-      if ((oldval & 2) != 0)
-	return 0;
-
-      /* Check if another thread already runs the initializer.	*/
-      if ((oldval & 1) == 0)
-	break;
-
-      /* Check whether the initializer execution was interrupted by a fork.  */
-      if (oldval != newval)
-	break;
-
-      /* Same generation, some other thread was faster. Wait.  */
-      lll_futex_wait (once_control, oldval, LLL_PRIVATE);
-    }
-
-
-  /* This thread is the first here.  Do the initialization.
-     Register a cleanup handler so that in case the thread gets
-     interrupted the initialization can be restarted.  */
-  pthread_cleanup_push (clear_once_control, once_control);
-
-  init_routine ();
-
-  pthread_cleanup_pop (0);
-
-
-  /* Add one to *once_control to take the bottom 2 bits from 01 to 10.
-     A release barrier is needed to ensure memory written by init_routine
-     is seen in other threads before *once_control changes.  */
-  int tmp;
-  __asm __volatile (__lll_rel_instr "\n"
-		    "1:	lwarx	%0,0,%2" MUTEX_HINT_REL "\n"
-		    "	addi	%0,%0,1\n"
-		    "	stwcx.	%0,0,%2\n"
-		    "	bne-	1b"
-		    : "=&b" (tmp), "=m" (*once_control)
-		    : "r" (once_control), "m" (*once_control)
-		    : "cr0");
-
-  /* Wake up all other threads.  */
-  lll_futex_wake (once_control, INT_MAX, LLL_PRIVATE);
-
-  return 0;
-}
-weak_alias (__pthread_once, pthread_once)
-hidden_def (__pthread_once)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]