This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Optimize SSE 4.1 x86_64 memcmp
- From: Florian Weimer <fweimer at redhat dot com>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: GNU C Library <libc-alpha at sourceware dot org>
- Date: Mon, 03 Feb 2014 17:16:13 +0100
- Subject: Re: [PATCH] Optimize SSE 4.1 x86_64 memcmp
- Authentication-results: sourceware.org; auth=none
- References: <52EBBCC2 dot 7090807 at redhat dot com> <20140131171911 dot GA25609 at domone dot podge> <52EF931A dot 3000508 at redhat dot com> <20140203144305 dot GA14697 at domone dot podge>
On 02/03/2014 03:43 PM, OndÅej BÃlka wrote:
And there is third factor that memcmp with small constant arguments
could be inlined. This is not case now but a patch would be welcome.
Inlining memcmp in GCC has historically been a bad decision. Perhaps we
could make an exception for memcmp calls with known alignment and really
small sizes. In terms of GCC optimizations, dispatching to a few
versions specialized for certain lengths, and a version that only
delivers an unordered, boolean result promises significant wins as well.
I didn't try to exercise the page-crossing code path, although its
byte-wise comparisons would be problematic from a timing oracle
point of view.
I did not optimized that as its cold path and I favoured simplicity.
It could be done with more complicated code that would be harder to
review and probably slower due to additional branch misprediction.
I looked at the page-crossing logic and noticed that you trigger it
based on the addresses alone. Is it not worth to check if only the
over-reading part would cross the page boundary?
As for SSE you need to emulate variable shifts
Is this the mask you compute in %rcx following the handle_end label?
int
cmp (uint64_t a, uint64_t b)
{
uint64_t diff = a ^ b;
uint64_t bit = a ^ (a - 1);
int64_t ret = (a & bit) - (b & bit);
return ret >> 32;
}
This looks broken to me. diff is dead, and after a little-endian load,
the bit order in the word is mixed, so additional fiddling is needed to
derive an accurate ordering from a bit difference.
--
Florian Weimer / Red Hat Product Security Team