This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[RFC] Fixing strcmp performance on power7 for unaligned loads.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: libc-alpha at sourceware dot org
- Date: Tue, 18 Aug 2015 23:18:26 +0200
- Subject: [RFC] Fixing strcmp performance on power7 for unaligned loads.
- Authentication-results: sourceware.org; auth=none
Hi,
As I told before that benchmarks should be read or they are useless so I
looked on powerpc ones. I noticed that power7 strcmp and strncmp are
about five times slower than memcmp for unaligned case.
Thats too much so I could easily improve performance by 50% on that case by
implementing strcmp as strnlen+memcmp loop despite overhead of strnlen.
As that loop is due that overhead lot slower than aligned data it should be fixed in
assembly by changing unaligned case to follow pattern in following c
code.
A strncmp should be same case when I will handle correctly handle corner
cases, benchmark results that i have now are same until segfault.
Same optimization would probably work also for older machines but I
don't have one to test it.
A part of benchtest large inputs is here:
simple_strcmp stupid_strcmp __strcmp_power7 __strcmp_power7b __strcmp_ppc
Length 32, alignment 0/ 0: 22.6719 31.8438 3.40625 14.875 5.39062
Length 32, alignment 0/ 4: 22.75 31.7969 18.9062 19.1094 19.2344
Length 32, alignment 4/ 5: 22.75 31.75 18.1875 20.1719 22.6562
Length 64, alignment 0/ 0: 40.3906 51.2031 5.03125 15.0156 8
Length 64, alignment 0/ 5: 40.5312 51.6094 24.6562 18.0781 32.5156
Length 64, alignment 5/ 6: 40.7969 51.0781 23.9531 19.3281 32.7188
Length 128, alignment 0/ 0: 76.5 91.5312 8 32.3281 17.4219
Length 128, alignment 0/ 6: 76.5 90.7969 45.25 41.25 60.5
Length 128, alignment 6/ 7: 76.25 91.1562 43.5 40.3906 61.7031
Length 256, alignment 0/ 0: 148.156 168.656 18.3281 57.7188 27.7656
Length 256, alignment 0/ 7: 148.422 168.969 83.0469 65.6406 115.828
Length 256, alignment 7/ 8: 146.25 169.391 83.5938 67.9219 115.75
Length 512, alignment 0/ 0: 291.953 333.031 30.25 90.9219 48.5
Length 512, alignment 0/ 8: 291.516 339.516 30.2656 93.0469 48.7188
Length 512, alignment 8/ 9: 291.578 333.984 161.75 109.109 226.281
Length 1024, alignment 0/ 0: 587.406 656.406 55.1562 159.688 89.7812
Length 1024, alignment 0/ 0: 578.688 649.219 55.2812 160.188 90.2188
Length 1024, alignment 0/ 9: 588.781 653.062 318.406 203.547 447.328
Length 1024, alignment 9/10: 589.406 650.5 320.688 196.375 447.484
diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
index 364385b..bbf6ee6 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
@@ -308,6 +318,10 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
IFUNC_IMPL_ADD (array, i, strcmp,
hwcap & PPC_FEATURE_HAS_VSX,
__strcmp_power7)
+ IFUNC_IMPL_ADD (array, i, strcmp,
+ hwcap & PPC_FEATURE_HAS_VSX,
+ __strcmp_power7b)
+
IFUNC_IMPL_ADD (array, i, strcmp, 1,
__strcmp_ppc))
diff --git a/sysdeps/powerpc/powerpc64/multiarch/strcmp.c b/sysdeps/powerpc/powerpc64/multiarch/strcmp.c
index b45ba1f..fd7a1b9 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/strcmp.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/strcmp.c
@@ -20,10 +20,47 @@
# include <string.h>
# include <shlib-compat.h>
# include "init-arch.h"
-
extern __typeof (strcmp) __strcmp_ppc attribute_hidden;
extern __typeof (strcmp) __strcmp_power7 attribute_hidden;
extern __typeof (strcmp) __strcmp_power8 attribute_hidden;
+extern __typeof (strnlen) __strnlen_power7 attribute_hidden;
+extern __typeof (memcmp) __memcmp_power7 attribute_hidden;
+extern __typeof (strcmp) __strcmp_power7 attribute_hidden;
+
+# include "libc-internal.h"
+int __strcmp_power7b(const char *a, const char *b)
+{
+ size_t len;
+ int ret;
+ len = __strnlen_power7 (a, 64);
+ len = __strnlen_power7 (b, len);
+ if (len != 64)
+ {
+ return __memcmp_power7 (a, b, len + 1);
+ }
+ ret = __memcmp_power7 (a, b, 64);
+ if (ret)
+ return ret;
+
+ const char *a_old = a;
+ a = PTR_ALIGN_DOWN (a + 64, 64);
+ b += a - a_old;
+
+ while (1)
+ {
+ len = __strnlen_power7 (b, 64);
+ if (len != 64)
+ {
+ return __memcmp_power7 (a, b, len + 1);
+ }
+
+ ret = __memcmp_power7 (a, b, 64);
+ if (ret)
+ return ret;
+ a+=64;
+ b+=64;
+ }
+}
libc_ifunc (strcmp,
(hwcap2 & PPC_FEATURE2_ARCH_2_07)