This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Optimize SSE 4.1 x86_64 memcmp

On Mon, Feb 03, 2014 at 05:16:13PM +0100, Florian Weimer wrote:
> On 02/03/2014 03:43 PM, OndÅej BÃlka wrote:
> >And there is third factor that memcmp with small constant arguments
> >could be inlined. This is not case now but a patch would be welcome.
> Inlining memcmp in GCC has historically been a bad decision.
> Perhaps we could make an exception for memcmp calls with known
> alignment and really small sizes.  In terms of GCC optimizations,
> dispatching to a few versions specialized for certain lengths, and a
> version that only delivers an unordered, boolean result promises
> significant wins as well.
That is problem in gcc that builtins are often badly optimized. Second
problem is that expansion needs to be small or you will lose when you
inline cold code.

Also making that a builtin adds unnecessary complexity, adding these
conditions to header is simpler.

In addition to constant sizes when you know that size is always larger 
than 8 and mismatch is likely there then you could do use inlined version below.

There is no need for specialized unordered case when you do comparison,
gcc is smart enough to optimize these as well as memcmp(x,y,n) > 0 case. Following:

int foo (int x)
  if (x>0) return 1;
  if (x<0) return -1;
  return 0;

int bar(int x){
  if (foo(x))
    return 4;
   return 2;

gets optimized to

        cmpl    $1, %edi
        sbbl    %eax, %eax
        andl    $-2, %eax
        addl    $4, %eax

And expansion that I talked about is here, I could make that cross
platform with check if unaligned loads are ok and bswap is reasonably

#include <stdint.h>
#include <string.h>
#undef memcmp

#define memcmp(x, y, n) \
({ \
  void *__x = x, *__y = y; \
  size_t __n = n; \
  int __ret; \
  if (__builtin_constant_p (__n >= 8)) \
    { \
      uint64_t __a = __builtin_bswap64(*((uint64_t *) __x)); \
      uint64_t __b = __builtin_bswap64(*((uint64_t *) __y)); \
      if (__a > __b) \
        __ret = 1; \
      else if (__a < __b) \
        __ret = -1; \
      else \
        __ret = __memcmp (__x + 8, __y + 8, __n - 8); \
    } \
  else \
    __ret = __memcmp (__x, __y, __n); \

int foo(char *x, char *y){
  if (memcmp(x,y,10) > 0)
    return 15;
   return 42;

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]