This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Optimize SSE 4.1 x86_64 memcmp

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Florian Weimer <fweimer at redhat dot com>
Cc: GNU C Library <libc-alpha at sourceware dot org>
Date: Tue, 4 Feb 2014 15:12:52 +0100
Subject: Re: [PATCH] Optimize SSE 4.1 x86_64 memcmp
Authentication-results: sourceware.org; auth=none
References: <52EBBCC2 dot 7090807 at redhat dot com> <20140131171911 dot GA25609 at domone dot podge> <52EF931A dot 3000508 at redhat dot com> <20140203144305 dot GA14697 at domone dot podge> <52EFC0CD dot 6030201 at redhat dot com>

On Mon, Feb 03, 2014 at 05:16:13PM +0100, Florian Weimer wrote:
> On 02/03/2014 03:43 PM, OndÅej BÃlka wrote:
> 
> >And there is third factor that memcmp with small constant arguments
> >could be inlined. This is not case now but a patch would be welcome.
> 
> Inlining memcmp in GCC has historically been a bad decision.
> Perhaps we could make an exception for memcmp calls with known
> alignment and really small sizes.  In terms of GCC optimizations,
> dispatching to a few versions specialized for certain lengths, and a
> version that only delivers an unordered, boolean result promises
> significant wins as well.
> 
That is problem in gcc that builtins are often badly optimized. Second
problem is that expansion needs to be small or you will lose when you
inline cold code.

Also making that a builtin adds unnecessary complexity, adding these
conditions to header is simpler.

In addition to constant sizes when you know that size is always larger 
than 8 and mismatch is likely there then you could do use inlined version below.

There is no need for specialized unordered case when you do comparison,
gcc is smart enough to optimize these as well as memcmp(x,y,n) > 0 case. Following:

int foo (int x)
{
  if (x>0) return 1;
  if (x<0) return -1;
  return 0;
}

int bar(int x){
  if (foo(x))
    return 4;
  else
   return 2;
}

gets optimized to

bar:
.LFB1:
        .cfi_startproc
        cmpl    $1, %edi
        sbbl    %eax, %eax
        andl    $-2, %eax
        addl    $4, %eax
        ret

And expansion that I talked about is here, I could make that cross
platform with check if unaligned loads are ok and bswap is reasonably
fast.



#include <stdint.h>
#include <string.h>
#undef memcmp

#define memcmp(x, y, n) \
({ \
  void *__x = x, *__y = y; \
  size_t __n = n; \
  int __ret; \
  if (__builtin_constant_p (__n >= 8)) \
    { \
      uint64_t __a = __builtin_bswap64(*((uint64_t *) __x)); \
      uint64_t __b = __builtin_bswap64(*((uint64_t *) __y)); \
      if (__a > __b) \
        __ret = 1; \
      else if (__a < __b) \
        __ret = -1; \
      else \
        __ret = __memcmp (__x + 8, __y + 8, __n - 8); \
    } \
  else \
    __ret = __memcmp (__x, __y, __n); \
  __ret;\
})

int foo(char *x, char *y){
  if (memcmp(x,y,10) > 0)
    return 15;
  else
   return 42;
}

References:
- Re: [PATCH] Optimize SSE 4.1 x86_64 memcmp
  - From: Florian Weimer
- Re: [PATCH] Optimize SSE 4.1 x86_64 memcmp
  - From: OndÅej BÃlka
- Re: [PATCH] Optimize SSE 4.1 x86_64 memcmp
  - From: Florian Weimer

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]