This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[RFC] Fixing strcmp performance on power7 for unaligned loads.


Hi,

As I told before that benchmarks should be read or they are useless so I
looked on powerpc ones. I noticed that power7 strcmp and strncmp are
about five times slower than memcmp for unaligned case.

Thats too much so I could easily improve performance by 50% on that case by 
implementing strcmp as strnlen+memcmp loop despite overhead of strnlen. 
As that loop is due that overhead lot slower than aligned data it should be fixed in
assembly by changing unaligned case to follow pattern in following c
code.

A strncmp should be same case when I will handle correctly handle corner
cases, benchmark results that i have now are same until segfault.

Same optimization would probably work also for older machines but I
don't have one to test it.

A part of benchtest large inputs is here:

    simple_strcmp   stupid_strcmp   __strcmp_power7 __strcmp_power7b __strcmp_ppc
Length   32, alignment  0/ 0:   22.6719 31.8438 3.40625 14.875  5.39062
Length   32, alignment  0/ 4:   22.75   31.7969 18.9062 19.1094 19.2344
Length   32, alignment  4/ 5:   22.75   31.75   18.1875 20.1719 22.6562
Length   64, alignment  0/ 0:   40.3906 51.2031 5.03125 15.0156 8
Length   64, alignment  0/ 5:   40.5312 51.6094 24.6562 18.0781 32.5156
Length   64, alignment  5/ 6:   40.7969 51.0781 23.9531 19.3281 32.7188
Length  128, alignment  0/ 0:   76.5    91.5312 8       32.3281 17.4219
Length  128, alignment  0/ 6:   76.5    90.7969 45.25   41.25   60.5
Length  128, alignment  6/ 7:   76.25   91.1562 43.5    40.3906 61.7031
Length  256, alignment  0/ 0:   148.156 168.656 18.3281 57.7188 27.7656
Length  256, alignment  0/ 7:   148.422 168.969 83.0469 65.6406 115.828
Length  256, alignment  7/ 8:   146.25  169.391 83.5938 67.9219 115.75
Length  512, alignment  0/ 0:   291.953 333.031 30.25   90.9219 48.5
Length  512, alignment  0/ 8:   291.516 339.516 30.2656 93.0469 48.7188
Length  512, alignment  8/ 9:   291.578 333.984 161.75  109.109 226.281
Length 1024, alignment  0/ 0:   587.406 656.406 55.1562 159.688 89.7812
Length 1024, alignment  0/ 0:   578.688 649.219 55.2812 160.188 90.2188
Length 1024, alignment  0/ 9:   588.781 653.062 318.406 203.547 447.328
Length 1024, alignment  9/10:   589.406 650.5   320.688 196.375 447.484



diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
index 364385b..bbf6ee6 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
@@ -308,6 +318,10 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      IFUNC_IMPL_ADD (array, i, strcmp,
 			      hwcap & PPC_FEATURE_HAS_VSX,
 			      __strcmp_power7)
+	      IFUNC_IMPL_ADD (array, i, strcmp,
+			      hwcap & PPC_FEATURE_HAS_VSX,
+			      __strcmp_power7b)
+
 	      IFUNC_IMPL_ADD (array, i, strcmp, 1,
 			     __strcmp_ppc))
 
diff --git a/sysdeps/powerpc/powerpc64/multiarch/strcmp.c b/sysdeps/powerpc/powerpc64/multiarch/strcmp.c
index b45ba1f..fd7a1b9 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/strcmp.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/strcmp.c
@@ -20,10 +20,47 @@
 # include <string.h>
 # include <shlib-compat.h>
 # include "init-arch.h"
-
 extern __typeof (strcmp) __strcmp_ppc attribute_hidden;
 extern __typeof (strcmp) __strcmp_power7 attribute_hidden;
 extern __typeof (strcmp) __strcmp_power8 attribute_hidden;
+extern __typeof (strnlen) __strnlen_power7 attribute_hidden;
+extern __typeof (memcmp) __memcmp_power7 attribute_hidden;
+extern __typeof (strcmp) __strcmp_power7 attribute_hidden;
+
+# include "libc-internal.h"
+int __strcmp_power7b(const char *a, const char *b)
+{
+  size_t len;
+  int ret;
+  len = __strnlen_power7 (a, 64);
+  len = __strnlen_power7 (b, len);
+  if (len != 64)
+    {
+      return __memcmp_power7 (a, b, len + 1);
+    }
+  ret = __memcmp_power7 (a, b, 64);
+  if (ret)
+    return ret;
+
+  const char *a_old = a;
+  a = PTR_ALIGN_DOWN (a + 64, 64);
+  b += a - a_old;
+
+  while (1)
+    {
+       len = __strnlen_power7 (b, 64);
+       if (len != 64)
+         {
+           return __memcmp_power7 (a, b, len + 1);
+         }
+
+       ret = __memcmp_power7 (a, b, 64);
+       if (ret)
+         return ret;
+       a+=64;
+       b+=64;      
+    }
+}
 
 libc_ifunc (strcmp,
             (hwcap2 & PPC_FEATURE2_ARCH_2_07)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]