This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] powerpc: Optimize memrchr for power8
- From: "Gabriel F. T. Gomes" <gabriel at inconstante dot eti dot br>
- To: Rajalakshmi Srinivasaraghavan <raji at linux dot vnet dot ibm dot com>
- Cc: <libc-alpha at sourceware dot org>
- Date: Fri, 29 Sep 2017 11:56:33 -0300
- Subject: Re: [PATCH] powerpc: Optimize memrchr for power8
- Authentication-results: sourceware.org; auth=none
- References: <1505810086-18337-1-git-send-email-raji@linux.vnet.ibm.com>
On 19 Sep 2017, Rajalakshmi Srinivasaraghavan wrote:
>--- a/sysdeps/powerpc/powerpc64/multiarch/memrchr-ppc64.c
>+++ b/sysdeps/powerpc/powerpc64/multiarch/memrchr-ppc64.c
>@@ -16,4 +16,16 @@
> License along with the GNU C Library; if not, see
> <http://www.gnu.org/licenses/>. */
>
>-#include <sysdeps/powerpc/powerpc32/power4/multiarch/memrchr-ppc32.c>
>+#include <string.h>
>+
>+#define MEMRCHR __memrchr_ppc
>+
>+#undef weak_alias
>+#define weak_alias(a, b)
>+
>+# undef libc_hidden_builtin_def
>+# define libc_hidden_builtin_def(name)
^
This space is not needed.
>--- /dev/null
>+++ b/sysdeps/powerpc/powerpc64/power8/memrchr.S
>@@ -0,0 +1,321 @@
>+/* Optimized memrchr implementation for PowerPC64/POWER8.
>+ Copyright (C) 2017 Free Software Foundation, Inc.
>+ Contributed by Luis Machado <luisgpm@br.ibm.com>.
No "Contributed by" statements in new files.
>+/* TODO: change these to the actual instructions when the minimum required
>+ binutils allows it. */
>+#define MTVRD(v, r) .long (0x7c000167 | ((v)<<(32-11)) | ((r)<<(32-16)))
>+#define MFVRD(r, v) .long (0x7c000067 | ((v)<<(32-11)) | ((r)<<(32-16)))
OK
I focused the review on the new code blocks (as compared to the power7
implementation). They look good, I only have minor comments.
>+ .align 4
>+ /* At this point, r8 is 16B aligned. */
>+L(align_qw):
> [...]
OK.
>+ /* Handle r5 > 64. Loop over the bytes in strides of 64B. */
>+ .align 4
>+L(loop):
> [...]
OK.
>+ /* Handle remainder of 64B loop or r5 > 64. */
>+ .align 4
>+L(tail64):
>[...]
OK.
>+ /* Found a match in 64B loop. */
>+ .align 4
>+L(found):
>+ /* Permute the first bit of each byte into bits 48-63. */
>+ VBPERMQ(v6, v6, v10)
>+ VBPERMQ(v7, v7, v10)
>+ VBPERMQ(v8, v8, v10)
>+ VBPERMQ(v9, v9, v10)
>+ /* Shift each component into its correct position for merging. */
>+#ifdef __LITTLE_ENDIAN__
>+ vsldoi v7, v7, v7, 2
>+ vsldoi v8, v8, v8, 4
>+ vsldoi v9, v9, v9, 6
>+#else
>+ vsldoi v6, v6, v6, 6
>+ vsldoi v7, v7, v7, 4
>+ vsldoi v8, v8, v8, 2
>+#endif
>+ /* Merge the results and move to a GPR. */
>+ vor v11, v6, v7
>+ vor v4, v9, v8
>+ vor v4, v11, v4
>+ MFVRD(r5, v4)
>+#ifdef __LITTLE_ENDIAN__
>+ cntlzd r6, r5 /* Count leading zeros before the match. */
>+#else
>+ addi r6, r5, -1
>+ andc r6, r6, r5
>+ popcntd r6, r6
>+#endif
>+ addi r8, r8, 63
>+ sub r3, r8, r6 /* Compute final length. */
~~~~~~~~~~~~~~~~~~~~
This comment is a bit misleading.
Maybe replace 'length' with 'address' or 'pointer'?
>+ blr
>+
>+ /* Found a match in last 16 bytes. */
>+ .align 4
>+L(found_16B):
>+ /* Permute the first bit of each byte into bits 48-63. */
>+ VBPERMQ(v6, v6, v10)
>+ /* Shift each component into its correct position for merging. */
>+#ifdef __LITTLE_ENDIAN__
>+ vsldoi v6, v6, v6, 6
>+ MFVRD(r7, v6)
>+ cntlzd r6, r7 /* Count leading zeros before the match. */
>+#else
>+ MFVRD(r7, v6)
>+ addi r6, r7, -1
>+ andc r6, r6, r7
>+ popcntd r6, r6
>+#endif
>+ addi r8, r8, 15
>+ sub r3, r8, r6 /* Compute final length. */
~~~~~~~~~~~~~~~~~~~~
Likewise.
Your patch looks good to me with these small changes.
Reviewed-by: Gabriel F. T. Gomes <gabriel@inconstante.eti.br>
PS: It's the first time I use the reviewed-by tag in this mailing list.
This and further uses imply what was discussed in libc-alpha [1],
particularly, it implies that I understand and agree with the "Reviewer's
statement of oversight" [2] used in the linux kernel community. Feel free
to add this Reviewed-by statement to your commit message if you want to.
[1] https://sourceware.org/ml/libc-alpha/2017-09/msg00816.html
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst