This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
On Fri, Jun 23, 2017 at 9:42 AM, Florian Weimer <fweimer@redhat.com> wrote:
> On 06/23/2017 06:38 PM, Carlos O'Donell wrote:
>
>> I assume that this catches the regression by ensuring the high values of
>> the subtraction result in an underflow which results in a positive value
>> of the subtraction and a wrong answer?
>
> Yes, I thought I said so in the commit message.
>
>> Was this comment ever accurate? mobzwl is not a BE load.
>
> We used bswap, so the register contents before the comparison is in
> big-endian format.
>
>>> + orl %edi, %eax
>>> + orl %esi, %ecx
>>> + /* Subtraction is okay because the upper 8 bits a zero. */
>>
>> s/a zero/are zero/g
>
> Okay, I'll fix this typo in a follow-up commit.
How about this patch to turn
movzbl -1(%rdi, %rdx), %edi
movzbl -1(%rsi, %rdx), %esi
orl %edi, %eax
orl %esi, %ecx
into
movb -1(%rdi, %rdx), %al
movb -1(%rsi, %rdx), %cl
H.J.
From acecb3f7de4892b68ec1b464a576ee84b3f97527 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Fri, 23 Jun 2017 11:29:38 -0700
Subject: [PATCH] x86-64: Optimize L(between_2_3) in memcmp-avx2-movbe.S
Turn
movzbl -1(%rdi, %rdx), %edi
movzbl -1(%rsi, %rdx), %esi
orl %edi, %eax
orl %esi, %ecx
into
movb -1(%rdi, %rdx), %al
movb -1(%rsi, %rdx), %cl
* sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S (between_2_3):
Replace movzbl and orl with movb.
---
sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S b/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
index 9d19210..abcc61c 100644
--- a/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
+++ b/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
@@ -144,11 +144,9 @@ L(between_2_3):
shll $8, %ecx
bswap %eax
bswap %ecx
- movzbl -1(%rdi, %rdx), %edi
- movzbl -1(%rsi, %rdx), %esi
- orl %edi, %eax
- orl %esi, %ecx
- /* Subtraction is okay because the upper 8 bits a zero. */
+ movb -1(%rdi, %rdx), %al
+ movb -1(%rsi, %rdx), %cl
+ /* Subtraction is okay because the upper 8 bits are zero. */
subl %ecx, %eax
ret
--
2.9.4