This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Fix powerpc fmax, fmin sNaN handling (bug 20947) [committed]
- From: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>
- To: Joseph Myers <joseph at codesourcery dot com>
- Cc: libc-alpha at sourceware dot org, Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>
- Date: Fri, 16 Dec 2016 11:19:49 -0600
- Subject: Re: Fix powerpc fmax, fmin sNaN handling (bug 20947) [committed]
- Authentication-results: sourceware.org; auth=none
- References: <alpine.DEB.2.20.1612150043400.25901@digraph.polyomino.org.uk>
- Reply-to: munroesj at linux dot vnet dot ibm dot com
On Thu, 2016-12-15 at 00:44 +0000, Joseph Myers wrote:
> Various fmax and fmin function implementations mishandle sNaN
> arguments:
>
> (a) When both arguments are NaNs, the return value should be a qNaN,
> but sometimes it is an sNaN if at least one argument is an sNaN.
>
> (b) Under TS 18661-1 semantics, if either argument is an sNaN then the
> result should be a qNaN (whereas if one argument is a qNaN and the
> other is not a NaN, the result should be the non-NaN argument).
> Various implementations treat sNaNs like qNaNs here.
>
> This patch fixes the powerpc versions of these functions (shared by
> float and double, 32-bit and 64-bit). The structure of those versions
> is that all ordered cases are already handled before anything dealing
> with the case where the arguments are unordered; thus, this patch
> causes no change to the code executed in the common case (neither
> argument a NaN).
>
This patch, as is, will cause a sever performance regression for
power6/7/8 and should be fixed or reverted before 2.25 freezes.
> Tested for powerpc (32-bit and 64-bit), together with tests to be
> added along with the x86_64 / x86 fixes. Committed.
>
> 2016-12-15 Joseph Myers <joseph@codesourcery.com>
>
> [BZ #20947]
> * sysdeps/powerpc/fpu/s_fmax.S (__fmax): Add the arguments when
> either is a signaling NaN.
> * sysdeps/powerpc/fpu/s_fmin.S (__fmin): Likewise.
>
> diff --git a/sysdeps/powerpc/fpu/s_fmax.S b/sysdeps/powerpc/fpu/s_fmax.S
> index 75ee74c..e6405c0 100644
> --- a/sysdeps/powerpc/fpu/s_fmax.S
> +++ b/sysdeps/powerpc/fpu/s_fmax.S
> @@ -25,7 +25,42 @@ ENTRY(__fmax)
> bnulr+ cr0
> /* x and y are unordered, so one of x or y must be a NaN... */
> fcmpu cr1,fp2,fp2
> - bunlr cr1
> + bun cr1,1f
> +/* x is a NaN; y is not. Test if x is signaling. */
> +#ifdef __powerpc64__
> + stfd fp1,-8(r1)
> + lwz r3,-8+HIWORD(r1)
Back to back stfd/lwz will cause a load-hit-store flush hazards on any
POWER processor since P4.
The size difference disables the store forwarding on POWER8 and forces
an instruction flush to force instruction ordering.
But for POWER8 the correct coding would replace st/ld with a direct move
instruction (mfvsrd or mfvsrwz for 32-bit) as appropriate.
For P6 / P7 the code should be scheduled so that the ld is in a
different dispatch group from the store, this should allow the hardware
to use a less expensive instruction reject to insure store to dependent
load ordering.
Or at minimum a group ending nop should be inserted between store and
load and sizes word/dword should match.
The group ending nop for P6 is "ori 1,1,0" and "ori 2,2,0" for P7 and
later.
> +#else
> + stwu r1,-16(r1)
> + cfi_adjust_cfa_offset (16)
> + stfd fp1,8(r1)
> + lwz r3,8+HIWORD(r1)
ditto
> + addi r1,r1,16
> + cfi_adjust_cfa_offset (-16)
> +#endif
> + andis. r3,r3,8
> + bne cr0,0f
> + b 2f
> +1: /* y is a NaN; x may or may not be. */
> + fcmpu cr1,fp1,fp1
> + bun cr1,2f
> +/* y is a NaN; x is not. Test if y is signaling. */
> +#ifdef __powerpc64__
> + stfd fp2,-8(r1)
> + lwz r3,-8+HIWORD(r1)
>
ditto
> +#else
> + stwu r1,-16(r1)
> + cfi_adjust_cfa_offset (16)
> + stfd fp2,8(r1)
> + lwz r3,8+HIWORD(r1)
>
ditto
> + addi r1,r1,16
> + cfi_adjust_cfa_offset (-16)
> +#endif
> + andis. r3,r3,8
> + bnelr cr0
> +2: /* x and y are NaNs, or one is a signaling NaN. */
> + fadd fp1,fp1,fp2
> + blr
> 0: fmr fp1,fp2
> blr
> END(__fmax)
> diff --git a/sysdeps/powerpc/fpu/s_fmin.S b/sysdeps/powerpc/fpu/s_fmin.S
> index 4d7c3b4..9ae77fe 100644
> --- a/sysdeps/powerpc/fpu/s_fmin.S
> +++ b/sysdeps/powerpc/fpu/s_fmin.S
> @@ -25,7 +25,42 @@ ENTRY(__fmin)
> bnulr+ cr0
> /* x and y are unordered, so one of x or y must be a NaN... */
> fcmpu cr1,fp2,fp2
> - bunlr cr1
> + bun cr1,1f
> +/* x is a NaN; y is not. Test if x is signaling. */
> +#ifdef __powerpc64__
> + stfd fp1,-8(r1)
> + lwz r3,-8+HIWORD(r1)
>
ditto
> +#else
> + stwu r1,-16(r1)
> + cfi_adjust_cfa_offset (16)
> + stfd fp1,8(r1)
> + lwz r3,8+HIWORD(r1)
>
ditto
> + addi r1,r1,16
> + cfi_adjust_cfa_offset (-16)
> +#endif
> + andis. r3,r3,8
> + bne cr0,0f
> + b 2f
> +1: /* y is a NaN; x may or may not be. */
> + fcmpu cr1,fp1,fp1
> + bun cr1,2f
> +/* y is a NaN; x is not. Test if y is signaling. */
> +#ifdef __powerpc64__
> + stfd fp2,-8(r1)
> + lwz r3,-8+HIWORD(r1)
>
ditto
> +#else
> + stwu r1,-16(r1)
> + cfi_adjust_cfa_offset (16)
> + stfd fp2,8(r1)
> + lwz r3,8+HIWORD(r1)
>
ditto
> + addi r1,r1,16
> + cfi_adjust_cfa_offset (-16)
> +#endif
> + andis. r3,r3,8
> + bnelr cr0
> +2: /* x and y are NaNs, or one is a signaling NaN. */
> + fadd fp1,fp1,fp2
> + blr
> 0: fmr fp1,fp2
> blr
> END(__fmin)
>