This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3] powerpc: strstr optimization


On Thu, Jul 16, 2015 at 06:05:10PM -0300, Tulio Magno Quites Machado Filho wrote:
> "Carlos O'Donell" <carlos@redhat.com> writes:
> 
> > On 07/16/2015 03:55 PM, OndÅej BÃlka wrote:
> >> Sorry Carlos, your trust is misplaced. This patch wasn't reviewed at
> >> all. I did that as test how much we could test IBM to verify patches. 
> 
> No.  That isn't true.
> I reviewed this patch myself.  All the 4 versions.
> So did Steven.
>
I did see only three. Also I didn't see you mails with comments on
libc-alpha. Please post these with comments so everybody could see
these.

If I were you then I would deny that I reviewed it as that cast bad
light on you. There are numerous flaws on these patches and that both of
you didn't catch bug shows it.
 
> >> I pointed out that it could have possibly quadratic behaviour which
> >> still does. So please don't accept unreviewed patches next time.
> 
> Yes, this patch does have a quadratic worst case as does quicksort.
> But for both algorithms, they're clearly an improvement over what was
> available before them.
> Is this algorithm perfect?  No.  I don't believe such a thing exist.  I
> believe in iterative improvements.
> 
That isn't clear improvement. As patch history versions 1 and 2 were bad
algorithm which took me lot of time to convince change. Then this is
better but its just my patch for that poorly rewriten. I clearly see
problems there, for example calling strnlen is unnecessary overhead.

One thing is iterative change, second is wasted effort. If code needs to
be completely replaced to improve performance then its better not to
send a patch at all. If a assembly is slower than equivalent c
impementation then its pointless to use assembly. So why should you keep
custom assembly when generic implementation is better?

> > They showed cases for which the code does go faster and objectively
> > so using the microbenchmark, and that's a win for now. Please continue 
> > to work with IBM to remove the quadratic worst case.
> 
> That's our intention.
> An improvement to this implementation is already in our backlog (i.e. we
> haven't started yet) and I'd be glad if someone is interested to help us with
> this.
>
That is simple as my generic implementation is both simpler and faster
and handles this. If you didn't waste time with assembly then you
wouldn't have backlog.

 
> > Tulio, You will need to work out why you have quadratic worst case.
> > It's certainly something we try to avoid. Did you make a particular
> > decision not to avoid it?
> 
> IMHO, the overall improvement brought by this patch is more important than
> the worst case scenario OndÅej is showing.
> Which doesn't mean we're neglecting the quadratic worst case.
> 
As this was also negatively received on x64, see bug 12100 pointing out
to fix that should every reviewer do.

And as generic case I many times asked if you ran my generic strstr
patch. It was lot faster than first two iteration and in practice looks
twice faster than this assembly. Thats because strings are generaly
short. Its bit slower than assembly or large size. Thats just caused by 
using ppc strchr instead power7 one which is slower. So I just used two
versions.

Results with modified benchmark attached, where I did hack to rename my
proposal as strstr_power7b while strstr_power7 is current to clearly see
difference.

So if you want better performance and fix quadratic behaviour what about
following?

	* benchtests/bench-strstr.c: Modify benchmark.
	* sysdeps/powerpc/powerpc64/multiarch/Makefile (routines): Add strstr-ppc64.
	* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
	(strstr_ppc, strstr_power7: Add.
	* sysdeps/powerpc/powerpc64/multiarch/strstr-ppc64.c: New file.
	* sysdeps/powerpc/powerpc64/multiarch/strstr.c: Likewise.

diff --git a/benchtests/bench-strstr.c b/benchtests/bench-strstr.c
index 74f3ee8..b0904e7 100644
--- a/benchtests/bench-strstr.c
+++ b/benchtests/bench-strstr.c
@@ -91,24 +91,15 @@ do_test (size_t align1, size_t align2, size_t len1, size_t len2,
     }
   s2[len2] = '\0';
 
-  if (fail)
-    {
-      char *ss1 = s1;
-      for (size_t l = len1; l > 0; l = l > dl ? l - dl : 0)
-	{
-	  size_t t = l > dl ? dl : l;
-	  memcpy (ss1, d, t);
-	  ++ss1[len2 > 7 ? 7 : len2 - 1];
-	  ss1 += t;
-	}
-    }
-  else
-    {
-      memset (s1, '0', len1);
-      memcpy (s1 + len1 - len2, s2, len2);
-    }
+  int i;
+  for (i=0; i < len1; i++)
+    s1[i] = (((unsigned)rand()) % 128) + 17;
   s1[len1] = '\0';
 
+  for (i=0; i < len2; i++)
+    s2[i]= (((unsigned) rand()) % 128) + 17;
+  s2[len2] = '\0';
+
   printf ("Length %4zd/%zd, alignment %2zd/%2zd, %s:",
 	  len1, len2, align1, align2, fail ? "fail" : "found");
 
@@ -128,6 +119,23 @@ test_main (void)
     printf ("\t%s", impl->name);
   putchar ('\n');
 
+
+  char s1[1000000],s2[2000];
+
+  memset (s1, 'a', 1000000);
+  memset (s2, 'a', 2000);
+  s1[9999] = '\0';
+  s2[1998] = 'b';
+  s2[1999] = '\0';
+
+  {
+    printf ("strstr(\"aaa...aaa\",\"aaa...aaab\")\n");
+    FOR_EACH_IMPL (impl, 0)
+      do_one_test (impl, s1, s2, NULL);
+    putchar('\n');
+  }
+
+
   for (size_t klen = 2; klen < 32; ++klen)
     for (size_t hlen = 2 * klen; hlen < 16 * klen; hlen += klen)
       {
diff --git a/sysdeps/powerpc/powerpc64/multiarch/Makefile b/sysdeps/powerpc/powerpc64/multiarch/Makefile
index 17265bd..b92edbf 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/Makefile
+++ b/sysdeps/powerpc/powerpc64/multiarch/Makefile
@@ -19,7 +19,7 @@ sysdep_routines += memcpy-power7 memcpy-a2 memcpy-power6 memcpy-cell \
 		   strcmp-power8 strcmp-power7 strcmp-ppc64 \
 		   strcat-power8 strcat-power7 strcat-ppc64 \
 		   memmove-power7 memmove-ppc64 wordcopy-ppc64 bcopy-ppc64 \
-		   strncpy-power8
+		   strncpy-power8 strstr-ppc64
 
 CFLAGS-strncase-power7.c += -mcpu=power7 -funroll-loops
 CFLAGS-strncase_l-power7.c += -mcpu=power7 -funroll-loops
diff --git a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
index f5fdea5..364385b 100644
--- a/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
+++ b/sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
@@ -322,5 +322,14 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array,
 	      IFUNC_IMPL_ADD (array, i, strcat, 1,
 			     __strcat_ppc))
 
+  /* Support sysdeps/powerpc/powerpc64/multiarch/strstr.c.  */
+  IFUNC_IMPL (i, name, strstr,
+             IFUNC_IMPL_ADD (array, i, strstr,
+                             hwcap & PPC_FEATURE_HAS_VSX,
+                             __strstr_power7)
+             IFUNC_IMPL_ADD (array, i, strstr, 1,
+                             __strstr_ppc))
+
+
   return i;
 }
diff --git a/sysdeps/powerpc/powerpc64/multiarch/strstr-ppc64.c b/sysdeps/powerpc/powerpc64/multiarch/strstr-ppc64.c
new file mode 100644
index 0000000..0ef9712
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/multiarch/strstr-ppc64.c
@@ -0,0 +1,72 @@
+/* Copyright (C) 2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+
+#define STRSTR __strstr_string
+#if IS_IN (libc) && defined(SHARED)
+# undef libc_hidden_builtin_def
+# define libc_hidden_builtin_def(name) \
+  __hidden_ver1(__strstr_ppc, __GI_strstr, __strstr_ppc);
+#endif
+
+extern __typeof (strstr) __strstr_string attribute_hidden;
+extern __typeof (strstr) __strstr_ppc attribute_hidden;
+
+extern char *__strchr_power7 (const char *x, int c);
+
+char *
+__strstr_power7 (char *s, char *n)
+{
+  if (!n[0])
+    return s;
+  while (1)
+    {
+      s = __strchr_power7 (s, n[0]);
+      if (!s)
+        return NULL;
+      size_t i;
+      for (i=1; s[i] == n[i] && n[i];i++);
+      if (!n[i])
+        return s;
+      if (i > 4)
+        return __strstr_string (s + 1, n);
+      s++;
+    }
+}
+
+char *
+__strstr_ppc (const char *s, const char *n)
+{
+  if (!n[0])
+    return (char *) s;
+  while (1)
+    {
+      s = strchr (s, n[0]);
+      if (!s)
+        return NULL;
+      size_t i;
+      for (i=1; s[i] == n[i] && n[i];i++);
+      if (!n[i])
+        return (char *) s;
+      if (i > 4)
+        return __strstr_string (s + 1, n);
+      s++;
+    }
+}
+
+#include <string/strstr.c>
diff --git a/sysdeps/powerpc/powerpc64/multiarch/strstr.c b/sysdeps/powerpc/powerpc64/multiarch/strstr.c
new file mode 100644
index 0000000..2be7646
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/multiarch/strstr.c
@@ -0,0 +1,34 @@
+/* Multiple versions of strstr. PowerPC64 version.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Define multiple versions only for definition in libc.  */
+#if IS_IN (libc)
+# include <string.h>
+# include <shlib-compat.h>
+# include "init-arch.h"
+
+extern __typeof (strstr) __strstr_ppc attribute_hidden;
+extern __typeof (strstr) __strstr_power7 attribute_hidden;
+
+/* Avoid DWARF definition DIE on ifunc symbol so that GDB can handle
+   ifunc symbol properly.  */
+libc_ifunc (strstr,
+            (hwcap & PPC_FEATURE_HAS_VSX)
+            ? __strstr_power7
+            : __strstr_ppc);
+#endif



Attachment: bench-strstr.out
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]