This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Performance improvement for log()


On Thu, 2008-04-10 at 10:27 -0700, Ulrich Drepper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Steven Munroe wrote:
> > This patch redefines ABS(x) as __builtin_fabs(x).
> 
> And to do this you copy the entire file?  This is insane.  Do what is
> done elsewhere.  Write a wrapper which defines macros and then uses the
> orginal code.  I know this means changes to some headers.  But code
> duplication isn't acceptable.
> 
Would it be better to fix the ABS(x) macro in ./ieee754/dbl-64/mpa.h?
This a more general solution but not sure if all platforms handle
__builtin_fabs(x) these days.

The other problem I am trying to address is the that powerpc64 is
actually faster using 64-bit long vs 32-bit int. So also changed the
local int vars to long and added a longnum union.

Also for bit mashing like the inline nan/inf checks using a union
powerpc32 does better with a single long long then with two ints. I
don't think this is common to other platforms and there is no 32/64
split in the generic ./ieee754 source structure (unless you are
comfortable changing all the local ints to longs in ./ieee754/dbl-64?)

> 
> > Also the call out to __isnan() in the posix wrapper code (w_log()) was
> > showing up hot in the profiles. So I added --with-cpu=[power4|power6]
> > implementations that effectively in-line the isnan() test within
> > __log().
> 
> That seems to be a bad idea as well.  It's no ppc-specific problem.
> Again, use macros to abstract out the actual code use so that other
> archs can parametrize this code as well.

Yes and no. The EXTRACT_WORDS, GET_HIGH_WORD, GET_LOW_WORD, macros are
just terrible because they don't allow scheduling of the following load
away from the store into the union. The result is the load-hit-store
hazard which is primary thing I am trying to avoid.

NET transfer FPRs to/from GPRs is powerpc specific issue.

So while I could macroize the isnan() as a wrapper around EXTRACT_WORDS,
does not solve the scheduling problem. I needed to mark the union
volatile:

	double z;
	volatile union double_long tmp;
	long long x_ll;
	
	/* Copy the double value x to a long to test for NaN without the
	   overhead of calling isnan or triggering VXSNAN.
	   Mark the union volatile and store x before calling  __ieee754_log()
	   so we can access the value as a 64-bit long long after
	   __ieee754_log().  This avoids the load-hit-store hazard when the
	   store queue is deep.  */
	tmp.d = x;

	z = __ieee754_log(x);

	x_ll = tmp.ll;

This is what it takes to keep GCC from moving the load up next to the
store (which creates the load-hit-store hazard I was trying to avoid.)
Is this a problem for other platforms?

Once the the data is transferred then a isnan_bits() macro that uses the
resulting long long value would clean up the source things up. 

And w_log.c is the simple case, w_pow.c is the nasty one (with multiple
isnan(), finite(), and signbit() calls). Here transferring the double to
to int/long exactly once pays big.

So does this justify powerpc specific code?

I could merge the power6 and power4 into to a single version if that
would help. Otherwise this would require major restructuring of the
math_private.h macros and the common ./ieee754/dbl-64/ implementations.

Perhaps that should be done, but not just before a release due to the
risk. 



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]