This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
[PATCH] Fma fixes

From: Jakub Jelinek <jakub at redhat dot com>
To: libc-alpha at sources dot redhat dot com
Cc: Richard Henderson <rth at twiddle dot net>, Martin Schwidefsky <schwidefsky at de dot ibm dot com>, Luis Machado <luisgpm at br dot ibm dot com>
Date: Wed, 13 Oct 2010 14:23:12 +0200
Subject: [PATCH] Fma fixes
References: <20101012191605.GQ2652@sunsite.ms.mff.cuni.cz> <4CB4DAF7.6080301@twiddle.net>
Reply-to: Jakub Jelinek <jakub at redhat dot com>
On Tue, Oct 12, 2010 at 03:02:31PM -0700, Richard Henderson wrote:
> On 10/12/2010 12:16 PM, Jakub Jelinek wrote:
> > Tested on x86_64-linux, it is very well possible it won't work correctly
> > on i686 due to excess precision.
> 
> Given that you're already manipulating the status word, you
> could go ahead and force 53-bit rounding at the same time
> with minimal extra overhead.  Of course, feupdateenv does
> not reset the rounding size to match.  You'd have to either
> open-code that function in order avoid doing extra work,
> or add new helper functions.

I believe as implemented here (incremental patch on top of the one from
yesterday) it should be fine for i?86 and actually avoids the need for
overflow checks.  Given that a2 + m2 is the error term and the result should
be already roughly computed in a1 (which is x * y + z) and thus it should be
much smaller than a1, I believe it is ok to add the 1 to mantissa if inexact
(which could be set either on the first or second addition) just once and just
do the final rounding during cast from long double to double.
I've been looking at http://www.lri.fr/~melquion/doc/05-imacs17_1-expose.pdf
summation here.

The patch also fixes a bug I found in the dbl-64/s_fma.c version using
the added testcases, and (untested) s390{,x} and ppc{,64} assembly
implementation.

As for the helpers, for better performance I believe it would be nice
to have some glibc internal inlines, because the
  feholdexcept (&env);
  fesetround (FE_TOWARDZERO);
and
  x = fetestexcept (FE_INEXACT);
  feupdateenv (&env);
operations are unnecessarily costly, at least on x86_64 and i?86 they
keep loading and storing fp control/status words too many times.
If we had internal APIs which would do something similar to
fegetenv and fesetenv in inlines in <fenv_libc.h>, perhaps
using the fegetenv_register and fesetenv_register API PowerPC uses
so that it doesn't force it to memory unnecessarily, and a bunch
of inlines/macros that would just operate on the saved fenv_t instead of
actually modifying the hw control/status registers, then we could avoid half
of the loads/stores.  The above would be:
  fenv_t env = fegetenv_register ();
  fenv_t newenv = feholdexcept_register (env); /* This would just clear all exceptions and make exceptions non-signalling.  */
  newenv = fesetround_register (newenv, FE_TOWARDZERO); /* And this would set rounding mode in it.  */
  fesetenv_register (newenv);
and
  fenv_t curenv = fegetenv_register ();
  x = fetestexcept_register (curenv, FE_INEXACT);
  feupdateenv_register (env, curenv);
feupdateenv_register could do roughly what feupdateenv does now, but
wouldn't need to save the current state (which is already passed into it).

For both feupdateenv and feupdateenv_register I wonder if the we couldn't
just or together the currently pending exceptions instead of calling
feraiseexcept if none of the exceptions in question raises a signal
in the new mode.

2010-10-13  Jakub Jelinek  <jakub@redhat.com>

	[BZ #3268]
	* math/libm-test.inc (fma_test): Some more fmaf and fma tests.
	* sysdeps/i386/i686/multiarch/s_fma.c: Include ldbl-96 version
	instead of dbl-64.
	* sysdeps/i386/fpu/bits/mathinline.h (fma, fmaf, fmal): Remove
	inlines.
	* sysdeps/ieee754/ldbl-96/s_fma.c: New file.
	* sysdeps/ieee754/dbl-64/s_fma.c (__fma): Fix exponent adjustment
	if one of x and y is very large and the other is subnormal.
	* sysdeps/s390/fpu/s_fmaf.c: New file.
	* sysdeps/s390/fpu/s_fma.c: New file.
	* sysdeps/powerpc/fpu/s_fmaf.S: New file.
	* sysdeps/powerpc/fpu/s_fma.S: New file.
	* sysdeps/powerpc/powerpc32/fpu/s_fma.S: New file.
	* sysdeps/powerpc/powerpc64/fpu/s_fma.S: New file.
	* sysdeps/unix/sysv/linux/s390/fpu/s_fma.c: New file.

--- libc/math/libm-test.inc.jj	2010-10-12 20:36:13.000000000 +0200
+++ libc/math/libm-test.inc	2010-10-13 12:32:11.000000000 +0200
@@ -2792,6 +2792,8 @@ fma_test (void)
 #if defined (TEST_FLOAT) && FLT_MANT_DIG == 24
   TEST_fff_f (fma, 0x1.7ff8p+13, 0x1.000002p+0, 0x1.ffffp-24, 0x1.7ff802p+13);
   TEST_fff_f (fma, 0x1.fffp+0, 0x1.00001p+0, -0x1.fffp+0, 0x1.fffp-20);
+  TEST_fff_f (fma, 0x1.9abcdep+127, 0x0.9abcdep-126, -0x1.f08948p+0, 0x1.bb421p-25);
+  TEST_fff_f (fma, 0x1.9abcdep+100, 0x0.9abcdep-126, -0x1.f08948p-27, 0x1.bb421p-52);
   TEST_fff_f (fma, 0x1.fffffep+127, 0x1.001p+0, -0x1.fffffep+127, 0x1.fffffep+115);
   TEST_fff_f (fma, -0x1.fffffep+127, 0x1.fffffep+0, 0x1.fffffep+127, -0x1.fffffap+127);
   TEST_fff_f (fma, 0x1.fffffep+127, 2.0, -0x1.fffffep+127, 0x1.fffffep+127);
@@ -2799,6 +2801,10 @@ fma_test (void)
 #if defined (TEST_DOUBLE) && DBL_MANT_DIG == 53
   TEST_fff_f (fma, 0x1.7fp+13, 0x1.0000000000001p+0, 0x1.ffep-48, 0x1.7f00000000001p+13);
   TEST_fff_f (fma, 0x1.fffp+0, 0x1.0000000000001p+0, -0x1.fffp+0, 0x1.fffp-52);
+  TEST_fff_f (fma, 0x1.0000002p+0, 0x1.ffffffcp-1, 0x1p-300, 1.0);
+  TEST_fff_f (fma, 0x1.0000002p+0, 0x1.ffffffcp-1, -0x1p-300, 0x1.fffffffffffffp-1);
+  TEST_fff_f (fma, 0x1.deadbeef2feedp+1023, 0x0.deadbeef2feedp-1022, -0x1.a05f8c01a4bfbp+1, 0x1.0989687bc9da4p-53);
+  TEST_fff_f (fma, 0x1.deadbeef2feedp+900, 0x0.deadbeef2feedp-1022, -0x1.a05f8c01a4bfbp-122, 0x1.0989687bc9da4p-176);
   TEST_fff_f (fma, 0x1.fffffffffffffp+1023, 0x1.001p+0, -0x1.fffffffffffffp+1023, 0x1.fffffffffffffp+1011);
   TEST_fff_f (fma, -0x1.fffffffffffffp+1023, 0x1.fffffffffffffp+0, 0x1.fffffffffffffp+1023, -0x1.ffffffffffffdp+1023);
   TEST_fff_f (fma, 0x1.fffffffffffffp+1023, 2.0, -0x1.fffffffffffffp+1023, 0x1.fffffffffffffp+1023);
--- libc/sysdeps/i386/i686/multiarch/s_fma.c.jj	2010-10-12 20:38:24.000000000 +0200
+++ libc/sysdeps/i386/i686/multiarch/s_fma.c	2010-10-13 11:04:44.000000000 +0200
@@ -33,4 +33,4 @@ weak_alias (__fma, fma)
 # define __fma __fma_ia32
 #endif
 
-#include <sysdeps/ieee754/dbl-64/s_fma.c>
+#include <sysdeps/ieee754/ldbl-96/s_fma.c>
--- libc/sysdeps/i386/fpu/bits/mathinline.h.jj	2009-08-29 03:34:51.000000000 +0200
+++ libc/sysdeps/i386/fpu/bits/mathinline.h	2010-10-13 11:07:45.000000000 +0200
@@ -1,6 +1,6 @@
 /* Inline math functions for i387.
-   Copyright (C) 1995,1996,1997,1998,1999,2000,2001,2003,2004,2006,2007,2009
-   Free Software Foundation, Inc.
+   Copyright (C) 1995,1996,1997,1998,1999,2000,2001,2003,2004,2006,2007,2009,
+   2010 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
    Contributed by John C. Bowman <bowman@math.ualberta.ca>, 1995.
 
@@ -657,8 +657,6 @@ __NTH (ldexpl (long double __x, int __y)
   __ldexp_code;
 }
 
-__inline_mathcodeNP3 (fma, __x, __y, __z, return (__x * __y) + __z)
-
 __inline_mathopNP (rint, "frndint")
 # endif /* __FAST_MATH__ */
 
--- libc/sysdeps/ieee754/ldbl-96/s_fma.c.jj	2010-10-12 15:44:44.000000000 +0200
+++ libc/sysdeps/ieee754/ldbl-96/s_fma.c	2010-10-13 11:05:40.000000000 +0200
@@ -0,0 +1,70 @@
+/* Compute x * y + z as ternary operation.
+   Copyright (C) 2010 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Jakub Jelinek <jakub@redhat.com>, 2010.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#include <float.h>
+#include <math.h>
+#include <fenv.h>
+#include <ieee754.h>
+
+/* This implementation uses rounding to odd to avoid problems with
+   double rounding.  See a paper by Boldo and Melquiond:
+   http://www.lri.fr/~melquion/doc/08-tc.pdf  */
+
+double
+__fma (double x, double y, double z)
+{
+  /* Multiplication m1 + m2 = x * y using Dekker's algorithm.  */
+#define C ((1ULL << (LDBL_MANT_DIG + 1) / 2) + 1)
+  long double x1 = x * C;
+  long double y1 = y * C;
+  long double m1 = x * y;
+  x1 = (x - x1) + x1;
+  y1 = (y - y1) + y1;
+  long double x2 = x - x1;
+  long double y2 = y - y1;
+  long double m2 = (((x1 * y1 - m1) + x1 * y2) + x2 * y1) + x2 * y2;
+
+  /* Addition a1 + a2 = z + m1 using Knuth's algorithm.  */
+  long double a1 = z + m1;
+  long double t1 = a1 - z;
+  long double t2 = a1 - t1;
+  t1 = m1 - t1;
+  t2 = z - t2;
+  long double a2 = t1 + t2;
+
+  fenv_t env;
+  feholdexcept (&env);
+  fesetround (FE_TOWARDZERO);
+  /* Perform m2 + a2 addition with round to odd.  */
+  a2 = a2 + m2;
+
+  /* Add that to a1 again using rounding to odd.  */
+  union ieee854_long_double u;
+  u.d = a1 + a2;
+  if ((u.ieee.mantissa1 & 1) == 0 && u.ieee.exponent != 0x7fff)
+    u.ieee.mantissa1 |= fetestexcept (FE_INEXACT) != 0;
+  feupdateenv (&env);
+
+  /* Add finally round to double precision.  */
+  return u.d;
+}
+#ifndef __fma
+weak_alias (__fma, fma)
+#endif
--- libc/sysdeps/ieee754/dbl-64/s_fma.c.jj	2010-10-12 20:52:40.000000000 +0200
+++ libc/sysdeps/ieee754/dbl-64/s_fma.c	2010-10-13 11:58:24.000000000 +0200
@@ -82,12 +82,18 @@ __fma (double x, double y, double z)
       else if (u.ieee.exponent >= 0x7ff - DBL_MANT_DIG)
 	{
 	  u.ieee.exponent -= DBL_MANT_DIG;
-	  v.ieee.exponent += DBL_MANT_DIG;
+	  if (v.ieee.exponent)
+	    v.ieee.exponent += DBL_MANT_DIG;
+	  else
+	    v.d *= 0x1p53;
 	}
       else
 	{
 	  v.ieee.exponent -= DBL_MANT_DIG;
-	  u.ieee.exponent += DBL_MANT_DIG;
+	  if (u.ieee.exponent)
+	    u.ieee.exponent += DBL_MANT_DIG;
+	  else
+	    u.d *= 0x1p53;
 	}
       x = u.d;
       y = v.d;
--- libc/sysdeps/s390/fpu/s_fmaf.c.jj	2010-10-13 13:28:08.000000000 +0200
+++ libc/sysdeps/s390/fpu/s_fmaf.c	2010-10-13 13:29:28.000000000 +0200
@@ -0,0 +1,32 @@
+/* Compute x * y + z as ternary operation.  S/390 version.
+   Copyright (C) 2010 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Jakub Jelinek <jakub@redhat.com>, 2010.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#include <math.h>
+
+float
+__fmaf (float x, float y, float z)
+{
+  float r;
+  asm ("maebr %0,%1,%2" : "=f" (r) : "%f" (x), "fR" (y), "0" (z));
+  return r;
+}
+#ifndef __fmaf
+weak_alias (__fmaf, fmaf)
+#endif
--- libc/sysdeps/s390/fpu/s_fma.c.jj	2010-10-13 13:28:05.000000000 +0200
+++ libc/sysdeps/s390/fpu/s_fma.c	2010-10-13 13:30:37.000000000 +0200
@@ -0,0 +1,37 @@
+/* Compute x * y + z as ternary operation.  S/390 version.
+   Copyright (C) 2010 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+   Contributed by Jakub Jelinek <jakub@redhat.com>, 2010.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#include <math.h>
+
+double
+__fma (double x, double y, double z)
+{
+  double r;
+  asm ("madbr %0,%1,%2" : "=f" (r) : "%f" (x), "fR" (y), "0" (z));
+  return r;
+}
+#ifndef __fma
+weak_alias (__fma, fma)
+#endif
+
+#ifdef NO_LONG_DOUBLE
+strong_alias (__fma, __fmal)
+weak_alias (__fmal, fmal)
+#endif
--- libc/sysdeps/powerpc/fpu/s_fmaf.S.jj	2010-10-13 12:55:34.000000000 +0200
+++ libc/sysdeps/powerpc/fpu/s_fmaf.S	2010-10-13 13:00:02.000000000 +0200
@@ -0,0 +1,28 @@
+/* Compute x * y + z as ternary operation.  PowerPC version.
+   Copyright (C) 2010 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#include <sysdep.h>
+
+ENTRY(__fmaf)
+/* float [f1] fmaf (float [f1] x, float [f2] y, float [f3] z); */
+	fmadds	fp1,fp1,fp2,fp3
+	blr
+END(__fmaf)
+
+weak_alias (__fmaf,fmaf)
--- libc/sysdeps/powerpc/fpu/s_fma.S.jj	2010-10-13 12:55:34.000000000 +0200
+++ libc/sysdeps/powerpc/fpu/s_fma.S	2010-10-13 12:58:19.000000000 +0200
@@ -0,0 +1,33 @@
+/* Compute x * y + z as ternary operation.  PowerPC version.
+   Copyright (C) 2010 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, write to the Free
+   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
+   02111-1307 USA.  */
+
+#include <sysdep.h>
+
+ENTRY(__fma)
+/* double [f1] fma (double [f1] x, double [f2] y, double [f3] z); */
+	fmadd	fp1,fp1,fp2,fp3
+	blr
+END(__fma)
+
+weak_alias (__fma,fma)
+
+#ifdef NO_LONG_DOUBLE
+weak_alias (__fma,__fmal)
+weak_alias (__fma,fmal)
+#endif
--- libc/sysdeps/powerpc/powerpc32/fpu/s_fma.S.jj	2010-10-13 13:02:32.000000000 +0200
+++ libc/sysdeps/powerpc/powerpc32/fpu/s_fma.S	2010-10-13 13:02:48.000000000 +0200
@@ -0,0 +1,5 @@
+#include <math_ldbl_opt.h>
+#include <sysdeps/powerpc/fpu/s_fma.S>
+#if LONG_DOUBLE_COMPAT(libm, GLIBC_2_1)
+compat_symbol (libm, __fma, fmal, GLIBC_2_1)
+#endif
--- libc/sysdeps/powerpc/powerpc64/fpu/s_fma.S.jj	2010-10-13 13:02:32.000000000 +0200
+++ libc/sysdeps/powerpc/powerpc64/fpu/s_fma.S	2010-10-13 13:02:48.000000000 +0200
@@ -0,0 +1,5 @@
+#include <math_ldbl_opt.h>
+#include <sysdeps/powerpc/fpu/s_fma.S>
+#if LONG_DOUBLE_COMPAT(libm, GLIBC_2_1)
+compat_symbol (libm, __fma, fmal, GLIBC_2_1)
+#endif
--- libc/sysdeps/unix/sysv/linux/s390/fpu/s_fma.c.jj	2010-10-13 13:42:46.000000000 +0200
+++ libc/sysdeps/unix/sysv/linux/s390/fpu/s_fma.c	2010-10-13 13:35:55.000000000 +0200
@@ -0,0 +1,5 @@
+#include <math_ldbl_opt.h>
+#include <sysdeps/s390/fpu/s_fma.c>
+#if LONG_DOUBLE_COMPAT(libm, GLIBC_2_1)
+compat_symbol (libm, __fma, fmal, GLIBC_2_1);
+#endif


	Jakub
References:
- [PATCH] Implement fma correctly
  - From: Jakub Jelinek
- Re: [PATCH] Implement fma correctly
  - From: Richard Henderson
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]