This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] [Patch 1/1] [Powerpc] Tune/optimize powerpc{32,64}/power7/memchr.S.
On Fri, 2012-05-11 at 09:32 -0500, Ryan S. Arnold wrote:
> On Fri, May 11, 2012 at 8:52 AM, Will Schmidt <will_schmidt@vnet.ibm.com> wrote:
> > [Powerpc] Tune/optimize powerpc{32,64}/power7/memchr.S.
> >
> > Assorted tweaking, twisting and tuning to squeeze a few additional cycles
> > out of the memchr code. Changes include bypassing the shift pairs (sld,srd)
> > when they are not required, and unrolling the small_loop that handles short
> > and trailing strings.
> > Per scrollpipe data measuring aligned strings for 64-bit, these changes save
> > between five and eight cycles (9-13% overall) for short strings (<32), Longer
> > aligned strings see slight improvement of 1-3% due to bypassing the shifts
> > and the instruction rearranging. Attempts to rework and partially unroll
> > the main loop did not show any benefits.
> > The Powerpc32 version of the code was changed in a similar fashion to match,
> > and should show similar improvements.
> >
> > Passed make check with no regressions.
> >
> > While I was in the neighborhood, I updated a few of the existing comments so
> > they made a bit more sense to me, and touched up a bit of the whitespace for
> > better consistency throughout.
> >
> > 2012-05-10 Will Schmidt <will_schmidt@vnet.ibm.com>
> >
> > * sysdeps/powerpc/powerpc64/power7/memchr.S: Unrolled short loop and
> > slight instruction rearrangements per scrollpipe analysis.
> > * sysdeps/powerpc/powerpc64/power7/memchr.S: Ditto.
> Hi Will,
>
> I'll apply the patch and check it out. My only question is with some
> formatting but I trust your numbers otherwise.
Ok.
>
> Were there any data sets for which there were regressions in
> performance or is this an all-around improvement?
Should be improvement or flat.
The only potential spot for performance penalty that I noticed and was
concerned about was for unaligned strings due to the added
compare/branch to avoid the sld,srd pairs for padding. Fortunately(?)
in that same section of code we are also loading and comparing the word,
so the added compare/branch instructions are completed during the wait
time, and overall there should be no penalty.
Thanks
-Will
>
> Ryan
>