This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Fix powerpc fmax, fmin sNaN handling (bug 20947) [committed]

From: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>
To: Joseph Myers <joseph at codesourcery dot com>
Cc: libc-alpha at sourceware dot org, Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>
Date: Fri, 16 Dec 2016 11:19:49 -0600
Subject: Re: Fix powerpc fmax, fmin sNaN handling (bug 20947) [committed]
Authentication-results: sourceware.org; auth=none
References: <alpine.DEB.2.20.1612150043400.25901@digraph.polyomino.org.uk>
Reply-to: munroesj at linux dot vnet dot ibm dot com

On Thu, 2016-12-15 at 00:44 +0000, Joseph Myers wrote:
> Various fmax and fmin function implementations mishandle sNaN
> arguments:
> 
> (a) When both arguments are NaNs, the return value should be a qNaN,
> but sometimes it is an sNaN if at least one argument is an sNaN.
> 
> (b) Under TS 18661-1 semantics, if either argument is an sNaN then the
> result should be a qNaN (whereas if one argument is a qNaN and the
> other is not a NaN, the result should be the non-NaN argument).
> Various implementations treat sNaNs like qNaNs here.
> 
> This patch fixes the powerpc versions of these functions (shared by
> float and double, 32-bit and 64-bit).  The structure of those versions
> is that all ordered cases are already handled before anything dealing
> with the case where the arguments are unordered; thus, this patch
> causes no change to the code executed in the common case (neither
> argument a NaN).
> 
This patch, as is, will cause a sever performance regression for
power6/7/8 and should be fixed or reverted before 2.25 freezes.

> Tested for powerpc (32-bit and 64-bit), together with tests to be
> added along with the x86_64 / x86 fixes.  Committed.
> 
> 2016-12-15  Joseph Myers  <joseph@codesourcery.com>
> 
> 	[BZ #20947]
> 	* sysdeps/powerpc/fpu/s_fmax.S (__fmax): Add the arguments when
> 	either is a signaling NaN.
> 	* sysdeps/powerpc/fpu/s_fmin.S (__fmin): Likewise.
> 
> diff --git a/sysdeps/powerpc/fpu/s_fmax.S b/sysdeps/powerpc/fpu/s_fmax.S
> index 75ee74c..e6405c0 100644
> --- a/sysdeps/powerpc/fpu/s_fmax.S
> +++ b/sysdeps/powerpc/fpu/s_fmax.S
> @@ -25,7 +25,42 @@ ENTRY(__fmax)
>  	bnulr+	cr0
>  /* x and y are unordered, so one of x or y must be a NaN... */
>  	fcmpu	cr1,fp2,fp2
> -	bunlr	cr1
> +	bun	cr1,1f
> +/* x is a NaN; y is not.  Test if x is signaling.  */
> +#ifdef __powerpc64__
> +	stfd	fp1,-8(r1)
> +	lwz	r3,-8+HIWORD(r1)
Back to back stfd/lwz will cause a load-hit-store flush hazards on any
POWER processor since P4.

The size difference disables the store forwarding on POWER8 and forces
an instruction flush to force instruction ordering.

But for POWER8 the correct coding would replace st/ld with a direct move
instruction (mfvsrd or mfvsrwz for 32-bit) as appropriate.

For P6 / P7 the code should be scheduled so that the ld is in a
different dispatch group from the store, this should allow the hardware
to use a less expensive instruction reject to insure store to dependent
load ordering.

Or at minimum a group ending nop should be inserted between store and
load and sizes word/dword should match.

The group ending nop for P6 is "ori 1,1,0" and "ori 2,2,0" for P7 and
later.


> +#else
> +	stwu	r1,-16(r1)
> +	cfi_adjust_cfa_offset (16)
> +	stfd	fp1,8(r1)
> +	lwz	r3,8+HIWORD(r1)

ditto 
> +	addi	r1,r1,16
> +	cfi_adjust_cfa_offset (-16)
> +#endif
> +	andis.	r3,r3,8
> +	bne	cr0,0f
> +	b	2f
> +1:	/* y is a NaN; x may or may not be.  */
> +	fcmpu	cr1,fp1,fp1
> +	bun	cr1,2f
> +/* y is a NaN; x is not.  Test if y is signaling.  */
> +#ifdef __powerpc64__
> +	stfd	fp2,-8(r1)
> +	lwz	r3,-8+HIWORD(r1)
> 
ditto 
> +#else
> +	stwu	r1,-16(r1)
> +	cfi_adjust_cfa_offset (16)
> +	stfd	fp2,8(r1)
> +	lwz	r3,8+HIWORD(r1)
> 
ditto 
> +	addi	r1,r1,16
> +	cfi_adjust_cfa_offset (-16)
> +#endif
> +	andis.	r3,r3,8
> +	bnelr	cr0
> +2:	/* x and y are NaNs, or one is a signaling NaN.  */
> +	fadd	fp1,fp1,fp2
> +	blr
>  0:	fmr	fp1,fp2
>  	blr
>  END(__fmax)
> diff --git a/sysdeps/powerpc/fpu/s_fmin.S b/sysdeps/powerpc/fpu/s_fmin.S
> index 4d7c3b4..9ae77fe 100644
> --- a/sysdeps/powerpc/fpu/s_fmin.S
> +++ b/sysdeps/powerpc/fpu/s_fmin.S
> @@ -25,7 +25,42 @@ ENTRY(__fmin)
>  	bnulr+	cr0
>  /* x and y are unordered, so one of x or y must be a NaN... */
>  	fcmpu	cr1,fp2,fp2
> -	bunlr	cr1
> +	bun	cr1,1f
> +/* x is a NaN; y is not.  Test if x is signaling.  */
> +#ifdef __powerpc64__
> +	stfd	fp1,-8(r1)
> +	lwz	r3,-8+HIWORD(r1)
> 
ditto 
> +#else
> +	stwu	r1,-16(r1)
> +	cfi_adjust_cfa_offset (16)
> +	stfd	fp1,8(r1)
> +	lwz	r3,8+HIWORD(r1)
> 
ditto 
> +	addi	r1,r1,16
> +	cfi_adjust_cfa_offset (-16)
> +#endif
> +	andis.	r3,r3,8
> +	bne	cr0,0f
> +	b	2f
> +1:	/* y is a NaN; x may or may not be.  */
> +	fcmpu	cr1,fp1,fp1
> +	bun	cr1,2f
> +/* y is a NaN; x is not.  Test if y is signaling.  */
> +#ifdef __powerpc64__
> +	stfd	fp2,-8(r1)
> +	lwz	r3,-8+HIWORD(r1)
> 
ditto 
> +#else
> +	stwu	r1,-16(r1)
> +	cfi_adjust_cfa_offset (16)
> +	stfd	fp2,8(r1)
> +	lwz	r3,8+HIWORD(r1)
> 
ditto 
> +	addi	r1,r1,16
> +	cfi_adjust_cfa_offset (-16)
> +#endif
> +	andis.	r3,r3,8
> +	bnelr	cr0
> +2:	/* x and y are NaNs, or one is a signaling NaN.  */
> +	fadd	fp1,fp1,fp2
> +	blr
>  0:	fmr	fp1,fp2
>  	blr
>  END(__fmin)
>

Follow-Ups:
- Re: Fix powerpc fmax, fmin sNaN handling (bug 20947) [committed]
  - From: Joseph Myers

References:
- Fix powerpc fmax, fmin sNaN handling (bug 20947) [committed]
  - From: Joseph Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]