This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Ping: [PATCH v4] faster strlen on x64


I uploaded results to my page.
If nobody has objection I will commit it tomorrow.

On Thu, Feb 28, 2013 at 06:10:57PM +0400, Dmitrieva Liubov wrote:
> Looks like for Haswell _new version is better that _revised for big
> lengths (according to results_rand/result.html).
> But you can submit any of them as both are better than the current
> one. We don't have any objections to reject the patch on our side.
> 
> --
> Liubov Dmitrieva
> 
> 2013/2/28 Dmitrieva Liubov <liubov.dmitrieva@gmail.com>:
> > Yes. This helps.
> >
> > Keep Haswell results.
> >
> > 2013/2/28 OndÅej BÃlka <neleai@seznam.cz>:
> >> On Thu, Feb 28, 2013 at 04:29:38PM +0400, Dmitrieva Liubov wrote:
> >>> I've tried to run that test suite on Haswell machine we have to
> >>> compare _revised version and _new version but got Segmentation fault.
> >>> I downloaded the archive, extracted all, and run at the test directory
> >>> "make" and "./benchmarks" commands one by one.
> >>> When ./benchmarks script called ./report binary the program broke.
> >>>
> >>> The stack is:
> >>> Program received signal SIGSEGV, Segmentation fault.
> >>> 0x000000301524d4d8 in __printf_fp () from /lib64/libc.so.6
> >>> Missing separate debuginfos, use: debuginfo-install glibc-2.15-57.fc17.x86_64
> >>> (gdb) bt
> >>> #0  0x000000301524d4d8 in __printf_fp () from /lib64/libc.so.6
> >>> #1  0x000000301524a748 in vfprintf () from /lib64/libc.so.6
> >>> #2  0x000000301526e124 in vsprintf () from /lib64/libc.so.6
> >>> #3  0x0000003015250987 in sprintf () from /lib64/libc.so.6
> >>> #4  0x0000000000402984 in report_fn (smp=0x7ffff7fee000,
> >>> fname=0x403d47 "function", flags=0, binaries=0x7ffff7ffbd20) at
> >>> report.c:91
> >>> #5  0x0000000000403603 in main () at functions.h:1
> >>>
> >> It is weird it gives segfault there. Only problem I see is that I disabled avx2
> >> compilation because it did not work with older binutils.
> >>
> >> I did disable it completely detection at file test_sse needs to be modified in following.
> >>
> >> if [ -z "$AVX2" ]
> >> then
> >> echo 3
> >> else
> >> echo 4 # replace with 3
> >> fi
> >>
> >> Tell me if this helps.
> >>>
> >>> --
> >>> Liubov Dmitrieva
> >>>
> >>> 2013/2/25 OndÅej BÃlka <neleai@seznam.cz>:
> >>> > Ping,
> >>> >
> >>> >
> >>> > On Wed, Feb 13, 2013 at 12:38:40PM +0100, OndÅej BÃlka wrote:
> >>> >> Hello,
> >>> >>
> >>> >> I wrote at previous version that unaligned read of first 16 bytes is bad
> >>> >> tradeoff. When I made faster strcpy header I realized that it was because
> >>> >> I was doing separate check if it crosses page.
> >>> >>
> >>> >> When I do only check if next 64 bytes do not cross page and first do
> >>> >> unaligned 16 byte load then it causes only small overhead for larger
> >>> >> strings. This makes my implementation faster for wider family of
> >>> >> workloads. It speed up gcc benchmark and most other programs.
> >>> >>
> >>> >> On unit tests revised version is somewhat slower than previous version.
> >>> >> It is caused by choosing first 16 bytes only rarely which causes branch
> >>> >> misprediction.
> >>> >>
> >>> >> I did two additional small improvements, first is squashing padding patch.
> >>> >> Second bit is test to cross page can be done as x%4096 < 4096-48 instead
> >>> >> x%4096 <= 4096-64 because I align x into 16 bytes.
> >>> >>
> >>> >> I updated benchmarks, difference between new and revised version is at
> >>> >> http://kam.mff.cuni.cz/~ondra/benchmark_string/strlen_profile.html
> >>> >> http://kam.mff.cuni.cz/~ondra/strlen_profile.tar.bz2
> >>> >>
> >>> >>
> >>> >> Ondra
> >>> >
> >>
> >> --
> >>
> >> Your mail is being routed through Germany ... and they're censoring us.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]