This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] faster strcmp by avoiding sse42.


On 08/07/2013 02:28 AM, OndÅej BÃlka wrote:
> .L17:
>         addq    $64, %rdi
>         addq    $64, %rsi
> .L12:
>         movdqu  (%rsi), %xmm4
>         pcmpeqb (%rdi), %xmm4
>         pminub  (%rdi), %xmm4
>         movdqu  16(%rsi), %xmm3
>         pcmpeqb 16(%rdi), %xmm3
>         pminub  16(%rdi), %xmm3
>         movdqu  32(%rsi), %xmm2
>         pcmpeqb 32(%rdi), %xmm2
>         pminub  32(%rdi), %xmm2
>         movdqu  48(%rsi), %xmm0
>         pcmpeqb 48(%rdi), %xmm0
>         pminub  48(%rdi), %xmm0
>         pminub  %xmm4, %xmm0
>         pminub  %xmm3, %xmm0
>         pminub  %xmm2, %xmm0
>         pcmpeqb %xmm6, %xmm0
>         pmovmskb        %xmm0, %eax
>         testl   %eax, %eax
>         je      .L17
>         jmp     .L15

Surely you can do better by dropping the movdqu from the loop
entirely, and instead re-read from memory on the cleanup path.


r~


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]