This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] x86-64: Add memcmp/wmemcmp optimized with AVX2
On 06/01/2017 11:29 PM, H.J. Lu wrote:
> L(between_4_7):
> movl (%rdi), %r8d
> movl (%rsi), %ecx
> shlq $32, %r8
> shlq $32, %rcx
> movl -4(%rdi, %rdx), %edi
> movl -4(%rsi, %rdx), %esi
> orq %rdi, %r8
> orq %rsi, %rcx
> bswap %r8
> bswap %rcx
> cmpq %rcx, %r8
> je L(zero)
> sbbl %eax, %eax
> orl $1, %eax
> ret
>
> and got
>
> Iteration 70485 - wrong result in function __memcmp_avx2 (18, 26, 5,
> 0) -1 != 1, p1 0x7ffff7ff0e00 p2 0x7ffff7fece00
>
> Where did I do wrong?
I think you created some PDP-endian thing there. The 4 bytes at (%rdi)
need to remain in the lower part of %r8, up until the bswap. In other
words, you need to shift the 4 bytes loaded from -4(%rdi, %rdx).
Thanks,
Florian