This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Split mantissa calculation loop and add branchprediction to mp multiplication
On Thu, Jan 03, 2013 at 10:18:08AM -0600, Steven Munroe wrote:
> This is very bad for POWER. PowerPC has (multiple) independent fixed
> point and floating point pipelines. This allow super-scalar out-of-order
> execution, UNTIL you force a transfer (through memory) between the
> FPRs/GPRs. PowerPC has lots of registers (32+32+32), we expect the
> compiler to keep lots of data in the registers, and so we don't optimize
> the hardware for dependent load after store, we optimize for memory
> bandwidth.
>
> You proposed code forces an (unnecessary) double->long conversion and
> FPR to GPR transfer into the inner loop, disabling any super-scalar
> parallel execution. It also prevents loop unrolling and does not allow
> GCC to make good use of all those registers we provide in the
> architecture.
>
> So your code is optimized for (register poor, in-order-execution) X86 at
> the expense of PowerPC.
>
I'm confused, which patch are you talking about, the current loop
split patch or the conversion of mantissa to int or some other patch?
I'll summarize the patches that are currently under review:
1) Conversion of mantissa of mp_no to int. This provides scope to
convert all mp operations to scalar. There are no conversions from
double to long or backwards except when constructing an mp_no or
deconstructing it to double. This patch is now stale and I need to
work on a new revision, especially in the light of the custom
powerpc code.
2) Fix build failure on power4 or later. This is just consolidation
of the declaration of globals and constant values. This should
have no impact on pipelining performance.
3) Splitting the multiplication loop (the current patch which you've
commented on). It does not affect powerpc code at all since
powerpc has a custom implementation of this loop.
Siddhesh