This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Rename __memcmp_sse4_2 to __memcmp_sse4_1.


On Fri, Jul 12, 2013 at 06:20:50PM +0200, OndÅej BÃlka wrote:
> On Fri, Jul 12, 2013 at 10:12:34AM +0400, Liubov Dmitrieva wrote:
> > Do you mean AMD? For Intel there is no a machine without SSE4_1 where
> > sse2 unaligned version is faster than ssse3.
> >
> Good to know. 
> 
> I looked at sources and found that memcmp is horribly misoptimized as usual.
> 
> As in 70% cases difference is found in first 16 characters and 99% in 64
> characters loop case is cold. 
> 
> This is not much problem when n>48 as starting unaligned comparison handles 
> this effectively for differences in first 16 characters.
> 
> However otherwise there is lot of jumps to choose based on size which is
> ineffective.
> 
> Code also answered what I thought was roadblock and why I did not try to
> optimize memcmp: That n is authoritative and we can seqfault when
> there is unallocated memory after first difference in range specified by
> n.
> 
> I will prepare patch with faster memcmp.
>  
For first 16 characters best I can come with is following:

#define LT  _mm_cmplt_epi8
#define get_mask(x) ((uint64_t) _mm_movemask_epi8 (x))
#define first_bit(x) ((x)^((x)-1))

      tp_vector va = LOADU (a);
      tp_vector vb = LOADU (b);
      tp_vector lt = first_bit (get_mask (LT (va,vb)) | ( 1 << 16));
      tp_vector gt = first_bit (get_mask (LT (vb,va)) | ( 1 << 16));
      if (get_mask (LT (va,vb)) | get_mask (LT (vb,va)))
        return lt-gt; // maybe swapped.
    
It finds first byte that is smaller and first byte that is bigger.
Then it creates byte masks which will come positive/negative based which
of these bytes was bigger.

        movdqu  (%rsi), %xmm0
        movdqu  (%rdi), %xmm1
        movdqa  %xmm0, %xmm2
        pcmpgtb %xmm1, %xmm2
        pcmpgtb %xmm0, %xmm1
        pmovmskb        %xmm2, %edx
        pmovmskb        %xmm1, %eax
        movl    %eax, %ecx
        orl     %edx, %ecx
        je      .L3
        orl     $65536, %edx
        movl    %eax, %ecx
        leal    -1(%rdx), %eax
        orl     $65536, %ecx
        xorl    %edx, %eax
        leal    -1(%rcx), %edx
        xorl    %ecx, %edx
        subl    %edx, %eax
        ret


Comments?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]