This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: PATCH: Optimized memset for x86-64

From: "H.J. Lu" <hjl at lucon dot org>
To: "Jagasia, Harsha" <harsha dot jagasia at amd dot com>
Cc: libc-alpha at sourceware dot org, drepper at redhat dot com
Date: Mon, 26 Nov 2007 15:25:33 -0800
Subject: Re: PATCH: Optimized memset for x86-64
References: <D5B24B5251882048AD03DDFA431BB790012F0504@SAUSEXMB3.amd.com>

On Mon, Nov 26, 2007 at 03:15:54PM -0600, Jagasia, Harsha wrote:
> Hi,
> 
> I have tested the memset posted by H.J on AMD's K8 processor. The graph
> is attached. The baseline is the original routine in glibc. The
> performance of H.J's memset is plotted as a percentage of the
> performance of the original memset. 
> 
> Here are some of the key observations:
> 
> - For small blocks (upto 115 bytes), H.J's memset is at par with the
> original memset.
>  

Can you clarify what you meant by "at par"? Up to 100byte, the new one
is much faster, up to 50%.

> - For medium block sizes (between 116 and the largest cache size), there
> are several misaligned and aligned blocks that under perform the
> original memset by 10% to 20%. 
> 
> I plan to investigate why medium blocks perform poorly and will report
> on this soon. 
> 
> - For very large block sizes (larger than largest cache size), the
> performance is at par. The relative improvement seen between 128KB and
> 512KB is because the original memset is under utilizing the cache by
> doing streaming stores too early.
> 
> As is, H.J's routine hurts performance significantly on K8 for medium
> blocks. I also plan to post results on the AMD Barcelona processor soon.
> I plan to fix the issues pointed out by Ulrich in AMD's previous
> submission and add an AMD path that addresses the performance issues
> noted above. 

We are investigate misaligned and medium blocks. We can compare
performance later.


H.J.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]