This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 1/2] Unaligned strcmp.


On 08/31/2013 11:09 PM, OndÅej BÃlka wrote:
> Hi,
> 
> This patch improves strcmp performance by around 20% for i7 processors 
> on gcc workloads, and 10% for bulldozer. For big sizes it around 50%
> faster than sse4.2 variant.
> 
> Benchmarks are here:
> 
> http://kam.mff.cuni.cz/~ondra/benchmark_string/strcmp_profile.html
> http://kam.mff.cuni.cz/~ondra/benchmark_string/strcmp_profile310813.tar.bz2
> 
> Also on old athlons unaligned implementation is faster than current sse2
> one so we should switch it.
> 
> For machines with ssse3 an implementation also needs to be optimized. I
> looked to code size and we have 16 branches, each 288 byte large which
> contain only 16-byte loops.
> 
> As my 64-byte loop fits into 115 bytes there is plenty of room for
> improvement.
> 
> Also we can reduce number of branches from 16 to 9 by switching
> arguments and multiplying result by 1/-1 in the end.
> 
> 
> This is first part of patch, second part would add strncmp macros. A strcmp.S
> ifunc selection would then need rework as I do not have effective
> strcasecmp yet. Now I add macro to avoid sse4_2 for strcmp.
> 
> There is possible improvement in cross-page case where I do byte-by-byte
> loop.
> 
> Passes tests, OK to commit?

Yes, thanks,

Andreas
-- 
 Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi
  SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 NÃrnberg, Germany
   GF: Jeff Hawn,Jennifer Guild,Felix ImendÃrffer,HRB16746 (AG NÃrnberg)
    GPG fingerprint = 93A3 365E CE47 B889 DF7F  FED1 389A 563C C272 A126


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]