This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH v4] faster strlen on x64


Hello,

I wrote at previous version that unaligned read of first 16 bytes is bad
tradeoff. When I made faster strcpy header I realized that it was because 
I was doing separate check if it crosses page.

When I do only check if next 64 bytes do not cross page and first do 
unaligned 16 byte load then it causes only small overhead for larger
strings. This makes my implementation faster for wider family of
workloads. It speed up gcc benchmark and most other programs.

On unit tests revised version is somewhat slower than previous version.
It is caused by choosing first 16 bytes only rarely which causes branch
misprediction.

I did two additional small improvements, first is squashing padding patch.
Second bit is test to cross page can be done as x%4096 < 4096-48 instead
x%4096 <= 4096-64 because I align x into 16 bytes.

I updated benchmarks, difference between new and revised version is at 
http://kam.mff.cuni.cz/~ondra/benchmark_string/strlen_profile.html
http://kam.mff.cuni.cz/~ondra/strlen_profile.tar.bz2
 

Ondra

2013-01-31  Ondrej Bilka  <neleai@seznam.cz>

  * sysdeps/x86_64/strlen.S: Replace with new SSE2 based
  implementation which is faster on all x86_64 architectures.
  Tested on AMD, Intel Nehalem, SNB, IVB.
  * sysdeps/x86_64/strnlen.S: Likewise.

  * sysdeps/x86_64/multiarch/Makefile (sysdep_routines):
  Remove all multiarch strlen and strnlen versions.
  * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Update.
  Remove strlen and strnlen related parts.

  * sysdeps/x86_64/multiarch/strcat-sse2-unaligned.S: Update.
  Inline strlen part.
  * sysdeps/x86_64/multiarch/strcat-ssse3.S: Likewise.

  * sysdeps/x86_64/multiarch/strlen.S: Remove.
  * sysdeps/x86_64/multiarch/strlen-sse2-no-bsf.S: Remove.
  * sysdeps/x86_64/multiarch/strlen-sse2-pminub.S: Remove.
  * sysdeps/x86_64/multiarch/rtld-strlen.S: Remove.
  * sysdeps/x86_64/multiarch/strlen-sse4.S: Remove.
  * sysdeps/x86_64/multiarch/strnlen.S: Remove.
  * sysdeps/x86_64/multiarch/strnlen-sse2-no-bsf.S: Remove.

Attachment: 0001-Faster-strlen-on-x86-64.patch
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]