This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]


On 06/23/2017 09:51 AM, Florian Weimer wrote:
> I think this revised patch is an improvement because it avoids adding a
> branch.

OK on the following conditions:

- Split the printf changes out and do that in another pass.
- Clarify if the new testsuite change did catch the regression.
- Adjust comment for L(between_2_3):.
- Fix grammar in assembly comment.
 
> Thanks,
> Florian
> 
> 
> memcmp.patch
> 
> 
> x86-64: memcmp-avx2-movbe.S needs saturating subtraction [BZ #21662]
> 
> This code:
> 
> L(between_2_3):
> 	/* Load as big endian with overlapping loads and bswap to avoid
> 	   branches.  */
> 	movzwl	-2(%rdi, %rdx), %eax
> 	movzwl	-2(%rsi, %rdx), %ecx
> 	shll	$16, %eax
> 	shll	$16, %ecx
> 	movzwl	(%rdi), %edi
> 	movzwl	(%rsi), %esi
> 	orl	%edi, %eax
> 	orl	%esi, %ecx
> 	bswap	%eax
> 	bswap	%ecx
> 	subl	%ecx, %eax
> 	ret
> 
> needs a saturating subtract because the full register is used.
> With this commit, only the lower 24 bits of the register are used,
> so a regular subtraction suffices.
> 
> The test case change adds coverage for these kinds of bugs.

Does the change to test-memcmp.c actually catch the bug reported by
the users?

> 2017-06-23  Florian Weimer  <fweimer@redhat.com>
> 
> 	[BZ #21662]
> 	* sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S (between_2_3):
> 	Peform saturating subtraction.
> 	* string/test-memcmp.c (check1): Check with different lengths.
> 	(check_result, do_random_tests): Write error messages to standard
> 	output.
> 
> diff --git a/string/test-memcmp.c b/string/test-memcmp.c
> index a7969ed..93bc972 100644
> --- a/string/test-memcmp.c
> +++ b/string/test-memcmp.c
> @@ -85,8 +85,8 @@ check_result (impl_t *impl, const CHAR *s1, const CHAR *s2, size_t len,
>        || (exp_result < 0 && result >= 0)
>        || (exp_result > 0 && result <= 0))
>      {
> -      error (0, 0, "Wrong result in function %s %d %d", impl->name,
> -	     result, exp_result);
> +      printf ("error: wrong result in function %s %d %d", impl->name,
> +	      result, exp_result);
>        ret = 1;
>        return -1;
>      }
> @@ -192,8 +192,10 @@ do_random_tests (void)
>  	      || (r < 0 && result >= 0)
>  	      || (r > 0 && result <= 0))
>  	    {
> -	      error (0, 0, "Iteration %zd - wrong result in function %s (%zd, %zd, %zd, %zd) %ld != %d, p1 %p p2 %p",
> -		     n, impl->name, align1 * CHARBYTES & 63,  align2 * CHARBYTES & 63, len, pos, r, result, p1, p2);
> +	      printf ("error: Iteration %zd - wrong result in function %s"
> +		      " (%zd, %zd, %zd, %zd) %ld != %d, p1 %p p2 %p",
> +		      n, impl->name, align1 * CHARBYTES & 63,
> +		      align2 * CHARBYTES & 63, len, pos, r, result, p1, p2);

Outputting informative messages should be done to stdout not stderr.

However, in the future please don't mix changes like this with a bug fix.

Please commit the above two hunks right away since I can't see anyone
objecting.

>  	      ret = 1;
>  	    }
>  	}
> @@ -441,11 +443,12 @@ check1 (void)
>  
>    n = 116;
>    for (size_t i = 0; i < n; i++)
> -    {
> -      exp_result = SIMPLE_MEMCMP (s1 + i, s2 + i, n - i);
> -      FOR_EACH_IMPL (impl, 0)
> -	check_result (impl, s1 + i, s2 + i, n - i, exp_result);
> -    }
> +    for (size_t len = 0; len <= n - i; ++len)
> +      {
> +	exp_result = SIMPLE_MEMCMP (s1 + i, s2 + i, len);
> +	FOR_EACH_IMPL (impl, 0)
> +	  check_result (impl, s1 + i, s2 + i, len, exp_result);
> +      }

OK, so you've added more iterations here over the length of the remaining
substring to compare.

I assume that this catches the regression by ensuring the high values of
the subtraction result in an underflow which results in a positive value
of the subtraction and a wrong answer?

>  }
>  
>  /* This test checks that memcmp doesn't overrun buffers.  */
> diff --git a/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S b/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
> index 47630dd..4cc9386 100644
> --- a/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
> +++ b/sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S
> @@ -139,16 +139,17 @@ L(exit):
>  L(between_2_3):
>  	/* Load as big endian with overlapping loads and bswap to avoid
>  	   branches.  */

Was this comment ever accurate? mobzwl is not a BE load.

Given that you changed the code, please adjust the comment.

> -	movzwl	-2(%rdi, %rdx), %eax
> -	movzwl	-2(%rsi, %rdx), %ecx
> -	shll	$16, %eax
> -	shll	$16, %ecx
> -	movzwl	(%rdi), %edi
> -	movzwl	(%rsi), %esi
> -	orl	%edi, %eax
> -	orl	%esi, %ecx
> +	movzwl	(%rdi), %eax
> +	movzwl	(%rsi), %ecx

OK, move 32-bit with zero-extend to full register.

1,2,0,0

> +	shll	$8, %eax
> +	shll	$8, %ecx

OK, shift out the part we don't care about.

0,1,2,0

>  	bswap	%eax
>  	bswap	%ecx

OK, swap, leaving either:

0,2,1,0

> +	movzbl	-1(%rdi, %rdx), %edi
> +	movzbl	-1(%rsi, %rdx), %esi

OK, byte 3 or 2 loaded with everything else set to 0 (24 bits).

3,0,0,0

or

2,0,0,0

> +	orl	%edi, %eax
> +	orl	%esi, %ecx
> +	/* Subtraction is okay because the upper 8 bits a zero.  */

s/a zero/are zero/g

We either have:

3,2,1,0

or

2,2,1,0

>  	subl	%ecx, %eax

OK, must use 'l' to subtract full DWORD.

>  	ret
>  

-- 
Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]