This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
On 06/23/2017 09:51 AM, Florian Weimer wrote:
> I think this revised patch is an improvement because it avoids adding a
> branch.
OK on the following conditions:
- Split the printf changes out and do that in another pass.
- Clarify if the new testsuite change did catch the regression.
- Adjust comment for L(between_2_3):.
- Fix grammar in assembly comment.
> Thanks,
> Florian
>
>
> memcmp.patch
>
>
> x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
>
> This code:
>
> L(between_2_3):
> /* Load as big endian with overlapping loads and bswap to avoid
> branches. */
> movzwl -2(%rdi, %rdx), %eax
> movzwl -2(%rsi, %rdx), %ecx
> shll $16, %eax
> shll $16, %ecx
> movzwl (%rdi), %edi
> movzwl (%rsi), %esi
> orl %edi, %eax
> orl %esi, %ecx
> bswap %eax
> bswap %ecx
> subl %ecx, %eax
> ret
>
> needs a saturating subtract because the full register is used.
> With this commit, only the lower 24 bits of the register are used,
> so a regular subtraction suffices.
>
> The test case change adds coverage for these kinds of bugs.
Does the change to test-memcmp.c actually catch the bug reported by
the users?
> 2017-06-23 Florian Weimer <fweimer@redhat.com>
>
> [BZ #21662]
> * sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S (between_2_3):
> Peform saturating subtraction.
> * string/test-memcmp.c (check1): Check with different lengths.
> (check_result, do_random_tests): Write error messages to standard
> output.
>
> diff --git a/string/test-memcmp.c b/string/test-memcmp.c
> index a7969ed..93bc972 100644
> --- a/string/test-memcmp.c
> +++ b/string/test-memcmp.c
> @@ -85,8 +85,8 @@ check_result (impl_t *impl, const CHAR *s1, const CHAR *s2, size_t len,
> || (exp_result < 0 && result >= 0)
> || (exp_result > 0 && result <= 0))
> {
> - error (0, 0, "Wrong result in function %s %d %d", impl->name,
> - result, exp_result);
> + printf ("error: wrong result in function %s %d %d", impl->name,
> + result, exp_result);
> ret = 1;
> return -1;
> }
> @@ -192,8 +192,10 @@ do_random_tests (void)
> || (r < 0 && result >= 0)
> || (r > 0 && result <= 0))
> {
> - error (0, 0, "Iteration %zd - wrong result in function %s (%zd, %zd, %zd, %zd) %ld != %d, p1 %p p2 %p",
> - n, impl->name, align1 * CHARBYTES & 63, align2 * CHARBYTES & 63, len, pos, r, result, p1, p2);
> + printf ("error: Iteration %zd - wrong result in function %s"
> + " (%zd, %zd, %zd, %zd) %ld != %d, p1 %p p2 %p",
> + n, impl->name, align1 * CHARBYTES & 63,
> + align2 * CHARBYTES & 63, len, pos, r, result, p1, p2);
Outputting informative messages should be done to stdout not stderr.
However, in the future please don't mix changes like this with a bug fix.
Please commit the above two hunks right away since I can't see anyone
objecting.
> ret = 1;
> }
> }
> @@ -441,11 +443,12 @@ check1 (void)
>
> n = 116;
> for (size_t i = 0; i < n; i++)
> - {
> - exp_result = SIMPLE_MEMCMP (s1 + i, s2 + i, n - i);
> - FOR_EACH_IMPL (impl, 0)
> - check_result (impl, s1 + i, s2 + i, n - i, exp_result);
> - }
> + for (size_t len = 0; len <= n - i; ++len)
> + {
> + exp_result = SIMPLE_MEMCMP (s1 + i, s2 + i, len);
> + FOR_EACH_IMPL (impl, 0)
> + check_result (impl, s1 + i, s2 + i, len, exp_result);
> + }
OK, so you've added more iterations here over the length of the remaining
substring to compare.
I assume that this catches the regression by ensuring the high values of
the subtraction result in an underflow which results in a positive value
of the subtraction and a wrong answer?
> }
>
> /* This test checks that memcmp doesn't overrun buffers. */
> diff --git a/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S b/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
> index 47630dd..4cc9386 100644
> --- a/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
> +++ b/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
> @@ -139,16 +139,17 @@ L(exit):
> L(between_2_3):
> /* Load as big endian with overlapping loads and bswap to avoid
> branches. */
Was this comment ever accurate? mobzwl is not a BE load.
Given that you changed the code, please adjust the comment.
> - movzwl -2(%rdi, %rdx), %eax
> - movzwl -2(%rsi, %rdx), %ecx
> - shll $16, %eax
> - shll $16, %ecx
> - movzwl (%rdi), %edi
> - movzwl (%rsi), %esi
> - orl %edi, %eax
> - orl %esi, %ecx
> + movzwl (%rdi), %eax
> + movzwl (%rsi), %ecx
OK, move 32-bit with zero-extend to full register.
1,2,0,0
> + shll $8, %eax
> + shll $8, %ecx
OK, shift out the part we don't care about.
0,1,2,0
> bswap %eax
> bswap %ecx
OK, swap, leaving either:
0,2,1,0
> + movzbl -1(%rdi, %rdx), %edi
> + movzbl -1(%rsi, %rdx), %esi
OK, byte 3 or 2 loaded with everything else set to 0 (24 bits).
3,0,0,0
or
2,0,0,0
> + orl %edi, %eax
> + orl %esi, %ecx
> + /* Subtraction is okay because the upper 8 bits a zero. */
s/a zero/are zero/g
We either have:
3,2,1,0
or
2,2,1,0
> subl %ecx, %eax
OK, must use 'l' to subtract full DWORD.
> ret
>
--
Cheers,
Carlos.