This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 1/2] Unaligned strcmp.
- From: Andreas Jaeger <aj at suse dot com>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: libc-alpha at sourceware dot org
- Date: Tue, 03 Sep 2013 15:31:17 +0200
- Subject: Re: [PATCH 1/2] Unaligned strcmp.
- Authentication-results: sourceware.org; auth=none
- References: <20130831210914 dot GA5424 at domone dot kolej dot mff dot cuni dot cz>
On 08/31/2013 11:09 PM, OndÅej BÃlka wrote:
> Hi,
>
> This patch improves strcmp performance by around 20% for i7 processors
> on gcc workloads, and 10% for bulldozer. For big sizes it around 50%
> faster than sse4.2 variant.
>
> Benchmarks are here:
>
> http://kam.mff.cuni.cz/~ondra/benchmark_string/strcmp_profile.html
> http://kam.mff.cuni.cz/~ondra/benchmark_string/strcmp_profile310813.tar.bz2
>
> Also on old athlons unaligned implementation is faster than current sse2
> one so we should switch it.
>
> For machines with ssse3 an implementation also needs to be optimized. I
> looked to code size and we have 16 branches, each 288 byte large which
> contain only 16-byte loops.
>
> As my 64-byte loop fits into 115 bytes there is plenty of room for
> improvement.
>
> Also we can reduce number of branches from 16 to 9 by switching
> arguments and multiplying result by 1/-1 in the end.
>
>
> This is first part of patch, second part would add strncmp macros. A strcmp.S
> ifunc selection would then need rework as I do not have effective
> strcasecmp yet. Now I add macro to avoid sse4_2 for strcmp.
>
> There is possible improvement in cross-page case where I do byte-by-byte
> loop.
>
> Passes tests, OK to commit?
Yes, thanks,
Andreas
--
Andreas Jaeger aj@{suse.com,opensuse.org} Twitter/Identica: jaegerandi
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 NÃrnberg, Germany
GF: Jeff Hawn,Jennifer Guild,Felix ImendÃrffer,HRB16746 (AG NÃrnberg)
GPG fingerprint = 93A3 365E CE47 B889 DF7F FED1 389A 563C C272 A126