This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Optimize SSE 4.1 x86_64 memcmp

From: Florian Weimer <fweimer at redhat dot com>
To: OndÅej BÃlka <neleai at seznam dot cz>
Cc: GNU C Library <libc-alpha at sourceware dot org>
Date: Mon, 03 Feb 2014 17:16:13 +0100
Subject: Re: [PATCH] Optimize SSE 4.1 x86_64 memcmp
Authentication-results: sourceware.org; auth=none
References: <52EBBCC2 dot 7090807 at redhat dot com> <20140131171911 dot GA25609 at domone dot podge> <52EF931A dot 3000508 at redhat dot com> <20140203144305 dot GA14697 at domone dot podge>

On 02/03/2014 03:43 PM, OndÅej BÃlka wrote:

And there is third factor that memcmp with small constant arguments
could be inlined. This is not case now but a patch would be welcome.

Inlining memcmp in GCC has historically been a bad decision. Perhaps wecould make an exception for memcmp calls with known alignment and reallysmall sizes. In terms of GCC optimizations, dispatching to a fewversions specialized for certain lengths, and a version that onlydelivers an unordered, boolean result promises significant wins as well.

I didn't try to exercise the page-crossing code path, although its
byte-wise comparisons would be problematic from a timing oracle
point of view.

I did not optimized that as its cold path and I favoured simplicity.
It could be done with more complicated code that would be harder to
review and probably slower due to additional branch misprediction.

I looked at the page-crossing logic and noticed that you trigger itbased on the addresses alone. Is it not worth to check if only theover-reading part would cross the page boundary?

As for SSE you need to emulate variable shifts


Is this the mask you compute in %rcx following the handle_end label?

int
cmp (uint64_t a, uint64_t b)
{
   uint64_t diff = a ^ b;
   uint64_t bit = a ^ (a - 1);
   int64_t ret = (a & bit) - (b & bit);
   return ret >> 32;
}

This looks broken to me. diff is dead, and after a little-endian load,the bit order in the word is mixed, so additional fiddling is needed toderive an accurate ordering from a bit difference.


--
Florian Weimer / Red Hat Product Security Team

Follow-Ups:
- Re: [PATCH] Optimize SSE 4.1 x86_64 memcmp
  - From: OndÅej BÃlka

References:
- Re: [PATCH] Optimize SSE 4.1 x86_64 memcmp
  - From: Florian Weimer
- Re: [PATCH] Optimize SSE 4.1 x86_64 memcmp
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]