This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Faster strchr implementation.


I see we have only drawdowns for Haswell and less but still present
for Silvermont.
http://kam.mff.cuni.cz/~ondra/benchmark_string/strcmp_profile.html

I think we should figure out the issue to get normal plots.

And by the way your link for Nehalem and Athlon are unavailable. (line
#1 and line #9)

http://kam.mff.cuni.cz/~ondra/benchmark_string/strcmp_profile.html

--
Liubov Dmitrieva
Intel Corporation

On Mon, Aug 12, 2013 at 2:48 PM, Liubov Dmitrieva
<liubov.dmitrieva@gmail.com> wrote:
> I tried the updated version for Haswell, strchr profiler hasn't
> completed the tests due to a fail in Makefile.
> See the log and the results.
>
>
> Yes, I saw the graphs for strchr, strrchr, stcmp where it passed
> correctly and I agree with your summary thought I see weird draw downs
> at some strcmp plots (like strcmp_hsw/results_rand_L3/result.html
> (plot #2) and many like this ). How should we interpret draw downs
> here?
>
> --
> Liubov Dmitrieva
> Intel Corporation
>
> On Fri, Aug 9, 2013 at 8:44 PM, OndÅej BÃlka <neleai@seznam.cz> wrote:
>> On Fri, Aug 09, 2013 at 06:05:43PM +0400, Liubov Dmitrieva wrote:
>>>
>> Yes, thanks. I updated with them pages:
>>
>> http://kam.mff.cuni.cz/~ondra/benchmark_string/strcmp_profile.html
>> http://kam.mff.cuni.cz/~ondra/benchmark_string/strchr_profile.html
>> http://kam.mff.cuni.cz/~ondra/benchmark_string/strrchr_profile.html
>>
>> As you can see graphs strcmp is 10% faster on haswell, atom and
>> silvermont.
>>
>> As avx2 implementation goes on haswell it is around 20% faster on large
>> inputs, but problem there is that most of time inputs are small. Using
>> avx is around 3% slower than strcmp_new on gcc test. One possible cause is
>> bigger latency of avx2, it may be worth to add avx2 ifunc.
>>
>> As strchr is concerned a no_bsf version is still faster on atom and
>> silvermont. My implementation win for larger sizes but I pays heavy penalty
>> for switching to 64 byte loop while strchr_no_bsf loop is cheap.
>>
>> A strrchr a new implementation is faster than current ones both
>> practically and asymptotically on haswell,atom and silvermont.
>>
>> As haswell I fixed strchr and strrchr benchmarks to call correct avx2
>> implementation. New version is here:
>>
>> http://kam.mff.cuni.cz/~ondra/benchmark_string/strchr_profile090813.tar.bz2
>> http://kam.mff.cuni.cz/~ondra/benchmark_string/strrchr_profile090813.tar.bz2
>>
>> It was bug in implementation, now fixed.
>> They are tested automatically.
>>
>> --
>>
>> endothermal recalibration


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]