This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: PATCH: Optimized memset for x86-64


Hi,
>>
>> I have tested the memset posted by H.J on AMD's K8 processor. The
graph
>> is attached. The baseline is the original routine in glibc. The
>> performance of H.J's memset is plotted as a percentage of the
>> performance of the original memset.
>>
>> Here are some of the key observations:
>>
>> - For small blocks (upto 115 bytes), H.J's memset is at par with the
>> original memset.
>>
>
>Can you clarify what you meant by "at par"? Up to 100byte, the new one
>is much faster, up to 50%.
>

Sorry for not replying earlier, I meant it's at par or faster.

>> - For medium block sizes (between 116 and the largest cache size),
there
>> are several misaligned and aligned blocks that under perform the
>> original memset by 10% to 20%.
>>
>> I plan to investigate why medium blocks perform poorly and will
report
>> on this soon.
>>
>> - For very large block sizes (larger than largest cache size), the
>> performance is at par. The relative improvement seen between 128KB
and
>> 512KB is because the original memset is under utilizing the cache by
>> doing streaming stores too early.
>>
>> As is, H.J's routine hurts performance significantly on K8 for medium
>> blocks. I also plan to post results on the AMD Barcelona processor
soon.
>> I plan to fix the issues pointed out by Ulrich in AMD's previous
>> submission and add an AMD path that addresses the performance issues
>> noted above.
>
>We are investigate misaligned and medium blocks. We can compare
>performance later.

I posted some data on Barcelona as well in a follow on thread. 

>
>
>H.J.
>




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]