This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Optimize SSE 4.1 x86_64 memcmp


On 02/03/2014 03:43 PM, OndÅej BÃlka wrote:

And there is third factor that memcmp with small constant arguments
could be inlined. This is not case now but a patch would be welcome.

Inlining memcmp in GCC has historically been a bad decision. Perhaps we could make an exception for memcmp calls with known alignment and really small sizes. In terms of GCC optimizations, dispatching to a few versions specialized for certain lengths, and a version that only delivers an unordered, boolean result promises significant wins as well.

I didn't try to exercise the page-crossing code path, although its
byte-wise comparisons would be problematic from a timing oracle
point of view.

I did not optimized that as its cold path and I favoured simplicity.
It could be done with more complicated code that would be harder to
review and probably slower due to additional branch misprediction.

I looked at the page-crossing logic and noticed that you trigger it based on the addresses alone. Is it not worth to check if only the over-reading part would cross the page boundary?

As for SSE you need to emulate variable shifts

Is this the mask you compute in %rcx following the handle_end label?

int
cmp (uint64_t a, uint64_t b)
{
   uint64_t diff = a ^ b;
   uint64_t bit = a ^ (a - 1);
   int64_t ret = (a & bit) - (b & bit);
   return ret >> 32;
}

This looks broken to me. diff is dead, and after a little-endian load, the bit order in the word is mixed, so additional fiddling is needed to derive an accurate ordering from a bit difference.

--
Florian Weimer / Red Hat Product Security Team


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]