This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: PATCH: Optimized memset for x86-64


On Mon, Nov 26, 2007 at 03:15:54PM -0600, Jagasia, Harsha wrote:
> Hi,
> 
> I have tested the memset posted by H.J on AMD's K8 processor. The graph
> is attached. The baseline is the original routine in glibc. The
> performance of H.J's memset is plotted as a percentage of the
> performance of the original memset. 
> 
> Here are some of the key observations:
> 
> - For small blocks (upto 115 bytes), H.J's memset is at par with the
> original memset.
>  

Can you clarify what you meant by "at par"? Up to 100byte, the new one
is much faster, up to 50%.

> - For medium block sizes (between 116 and the largest cache size), there
> are several misaligned and aligned blocks that under perform the
> original memset by 10% to 20%. 
> 
> I plan to investigate why medium blocks perform poorly and will report
> on this soon. 
> 
> - For very large block sizes (larger than largest cache size), the
> performance is at par. The relative improvement seen between 128KB and
> 512KB is because the original memset is under utilizing the cache by
> doing streaming stores too early.
> 
> As is, H.J's routine hurts performance significantly on K8 for medium
> blocks. I also plan to post results on the AMD Barcelona processor soon.
> I plan to fix the issues pointed out by Ulrich in AMD's previous
> submission and add an AMD path that addresses the performance issues
> noted above. 

We are investigate misaligned and medium blocks. We can compare
performance later.


H.J.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]