This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] [Patch 1/1] [Powerpc] Tune/optimize powerpc{32,64}/power7/memchr.S.

From: Will Schmidt <will_schmidt at vnet dot ibm dot com>
To: "Ryan S. Arnold" <ryan dot arnold at gmail dot com>
Cc: libc-alpha at sourceware dot org, willschm at us dot ibm dot com
Date: Fri, 11 May 2012 10:08:19 -0500
Subject: Re: [PATCH] [Patch 1/1] [Powerpc] Tune/optimize powerpc{32,64}/power7/memchr.S.
References: <20120511135221.7637.33663.stgit@brimstone> <CAAKybw8j8EdsO9qc3u5pii5KH=OTwoN2UqywwMnaJowX1m92yQ@mail.gmail.com>
Reply-to: will_schmidt at vnet dot ibm dot com

On Fri, 2012-05-11 at 09:32 -0500, Ryan S. Arnold wrote:
> On Fri, May 11, 2012 at 8:52 AM, Will Schmidt <will_schmidt@vnet.ibm.com> wrote:
> > [Powerpc] Tune/optimize powerpc{32,64}/power7/memchr.S.
> >
> > Assorted tweaking, twisting and tuning to squeeze a few additional cycles
> > out of the memchr code.   Changes include bypassing the shift pairs (sld,srd)
> > when they are not required, and unrolling the small_loop that handles short
> > and trailing strings.
> > Per scrollpipe data measuring aligned strings for 64-bit, these changes save
> > between five and eight cycles (9-13% overall) for short strings (<32),  Longer
> > aligned strings see slight improvement of 1-3% due to bypassing the shifts
> > and the instruction rearranging.  Attempts to rework and partially unroll
> > the main loop did not show any benefits.
> > The Powerpc32 version of the code was changed in a similar fashion to match,
> > and should show similar improvements.
> >
> > Passed make check with no regressions.
> >
> > While I was in the neighborhood, I updated a few of the existing comments so
> > they made a bit more sense to me, and touched up a bit of the whitespace for
> > better consistency throughout.
> >
> > 2012-05-10  Will Schmidt <will_schmidt@vnet.ibm.com>
> >
> >        * sysdeps/powerpc/powerpc64/power7/memchr.S:  Unrolled short loop and
> >         slight instruction rearrangements per scrollpipe analysis.
> >        * sysdeps/powerpc/powerpc64/power7/memchr.S:  Ditto.
> Hi Will,
> 
> I'll apply the patch and check it out. My only question is with some
> formatting but I trust your numbers otherwise.

Ok.   

> 
> Were there any data sets for which there were regressions in
> performance or is this an all-around improvement?

Should be improvement or flat.  

The only potential spot for performance penalty that I noticed and was
concerned about was for unaligned strings due to the added
compare/branch to avoid the sld,srd pairs for padding.   Fortunately(?)
in that same section of code we are also loading and comparing the word,
so the added compare/branch instructions are completed during the wait
time, and overall there should be no penalty.


Thanks
-Will

> 
> Ryan
>

References:
- [PATCH] [Patch 1/1] [Powerpc] Tune/optimize powerpc{32,64}/power7/memchr.S.
  - From: Will Schmidt
- Re: [PATCH] [Patch 1/1] [Powerpc] Tune/optimize powerpc{32, 64}/power7/memchr.S.
  - From: Ryan S. Arnold

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]