This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
[PATCH] v12 Improves __ieee754_exp() performance by 6-11% on aarch64/sparc/x86.

From: Patrick McGehearty <patrick dot mcgehearty at oracle dot com>
To: libc-alpha at sourceware dot org
Date: Thu, 15 Mar 2018 00:22:00 -0400
Subject: [PATCH] v12 Improves __ieee754_exp() performance by 6-11% on aarch64/sparc/x86.
New with this version:
Only updates to e_exp.c and eexp.tbl plus revised
libm-test-ulps for aarch64/sparc/x86_64 as removal of slowexp()
was accomplished by prior patch.

Summary of patch rationale

These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.

Typical performance gains are 6% on aarch64, 28% on Sparc s7 and 11%
on x86_64 based on the glibc_perf tests.

Glibc correctness tests for exp() and expf() were run. Within the test
suite 1 input value was found to cause a 1 ulp difference when
"FE_TONEAREST" rounding mode is set. No differences in exp()
were seen for the tested values for the other rounding modes.

When tested over a range of 10 million input values, the new code
gets a 1 ulp error approximately 1.6 times per 1000 values.
That rate was similar for all four rounding modes.
The patch uses a 64 entry scaling table. The existing
code uses a 512 entry table.

Further optimization is possible in the handling of rounding
modes. Using get_rounding_mode and libc_fesetround() instead of
SET_RESTORE_ROUND provides a measurable gain for Sparc.
Unfortunately, on x86, one works with sse fp unit rounding mode while
the other works on x87 fp unit rounding mode.  Adding libc_fegetround,
libc_fegetroundf and libc_fegetroundl to to match libc_fesetround()
should not be too large a task but outside the scope of this patch.
---
 sysdeps/aarch64/libm-test-ulps    |    2 +
 sysdeps/ieee754/dbl-64/e_exp.c    |  307 ++++++++++++++++++++-----------------
 sysdeps/ieee754/dbl-64/eexp.tbl   |  255 ++++++++++++++++++++++++++++++
 sysdeps/sparc/fpu/libm-test-ulps  |    2 +
 sysdeps/x86_64/fpu/libm-test-ulps |    2 +
 5 files changed, 424 insertions(+), 144 deletions(-)
 create mode 100644 sysdeps/ieee754/dbl-64/eexp.tbl

diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps
index 1f46980..58508b7 100644
--- a/sysdeps/aarch64/libm-test-ulps
+++ b/sysdeps/aarch64/libm-test-ulps
@@ -1514,7 +1514,9 @@ ildouble: 5
 ldouble: 5
 
 Function: "exp":
+double: 1
 float: 1
+idouble: 1
 ifloat: 1
 ildouble: 1
 ldouble: 1
diff --git a/sysdeps/ieee754/dbl-64/e_exp.c b/sysdeps/ieee754/dbl-64/e_exp.c
index 62035a8..d6342e1 100644
--- a/sysdeps/ieee754/dbl-64/e_exp.c
+++ b/sysdeps/ieee754/dbl-64/e_exp.c
@@ -1,3 +1,4 @@
+/* EXP function - Compute double precision exponential */
 /*
  * IBM Accurate Mathematical Library
  * written by International Business Machines Corp.
@@ -31,174 +32,192 @@
 /*                                                                         */
 /***************************************************************************/
 
+/*  IBM exp(x) replaced by following exp(x) in 2018. IBM exp1(x,xx) remains.  */
+/* exp(x)
+   Hybrid algorithm of Peter Tang's Table driven method (for large
+   arguments) and an accurate table (for small arguments).
+   Written by K.C. Ng, November 1988.
+   Revised by Patrick McGehearty, Jan 2018 to use j/64 instead of j/32
+   and to round with FE_TONEAREST if another rounding mode is set on entry.
+   Method (large arguments):
+	1. Argument Reduction: given the input x, find r and integer k
+	   and j such that
+	             x = (k+j/64)*(ln2) + r,  |r| <= (1/128)*ln2
+
+	2. exp(x) = 2^k * (2^(j/64) + 2^(j/64)*expm1(r))
+	   a. expm1(r) is approximated by a polynomial:
+	      expm1(r) ~ r + t1*r^2 + t2*r^3 + ... + t5*r^6
+	      Here t1 = 1/2 exactly.
+	   b. 2^(j/64) is represented to twice double precision
+	      as TBL[2j]+TBL[2j+1].
+
+   Note: If divide were fast enough, we could use another approximation
+	 in 2.a:
+	      expm1(r) ~ (2r)/(2-R), R = r - r^2*(t1 + t2*r^2)
+	      (for the same t1 and t2 as above)
+
+   Special cases:
+	exp(INF) is INF, exp(NaN) is NaN;
+	exp(-INF)=  0;
+	for finite argument, only exp(0)=1 is exact.
+
+   Accuracy:
+	According to an error analysis, the error is always less than
+	an ulp (unit in the last place).  The largest errors observed
+	are less than 0.55 ulp for normal results and less than 0.75 ulp
+	for subnormal results.
+
+   Misc. info.
+	For IEEE double
+		if x >  7.09782712893383973096e+02 then exp(x) overflow
+		if x < -7.45133219101941108420e+02 then exp(x) underflow.  */
+
 #include <math.h>
+#include <math-svid-compat.h>
+#include <math_private.h>
+#include <errno.h>
 #include "endian.h"
 #include "uexp.h"
+#include "uexp.tbl"
 #include "mydefs.h"
 #include "MathLib.h"
-#include "uexp.tbl"
-#include <math_private.h>
 #include <fenv.h>
 #include <float.h>
 
+extern double __ieee754_exp (double);
+
+#include "eexp.tbl"
+
+static const double
+  half = 0.5,
+  one = 1.0;
+
 #ifndef SECTION
 # define SECTION
 #endif
 
 double
 SECTION
-__ieee754_exp (double x)
+__ieee754_exp (double x_arg)
 {
-  double bexp, t, eps, del, base, y, al, bet, res, rem, cor;
-  mynumber junk1, junk2, binexp = {{0, 0}};
-  int4 i, j, m, n, ex;
+  double z, t;
   double retval;
-
+  int hx, ix, k, j, m;
+  union
   {
-    SET_RESTORE_ROUND (FE_TONEAREST);
-
-    junk1.x = x;
-    m = junk1.i[HIGH_HALF];
-    n = m & hugeint;
-
-    if (n > smallint && n < bigint)
-      {
-	y = x * log2e.x + three51.x;
-	bexp = y - three51.x;	/*  multiply the result by 2**bexp        */
-
-	junk1.x = y;
-
-	eps = bexp * ln_two2.x;	/* x = bexp*ln(2) + t - eps               */
-	t = x - bexp * ln_two1.x;
-
-	y = t + three33.x;
-	base = y - three33.x;	/* t rounded to a multiple of 2**-18      */
-	junk2.x = y;
-	del = (t - base) - eps;	/*  x = bexp*ln(2) + base + del           */
-	eps = del + del * del * (p3.x * del + p2.x);
-
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 1023) << 20;
-
-	i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-	j = (junk2.i[LOW_HALF] & 511) << 1;
+    int i_part[2];
+    double x;
+  } xx;
+  union
+  {
+    int y_part[2];
+    double y;
+  } yy;
+  xx.x = x_arg;
 
-	al = coar.x[i] * fine.x[j];
-	bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	       + coar.x[i + 1] * fine.x[j + 1]);
+  ix = xx.i_part[HIGH_HALF];
+  hx = ix & ~0x80000000;
 
-	rem = (bet + bet * eps) + al * eps;
-	res = al + rem;
-	/* Maximum relative error is 7.8e-22 (70.1 bits).
-	   Maximum ULP error is 0.500007.  */
-	retval = res * binexp.x;
-	goto ret;
-      }
-
-    if (n <= smallint)
-      {
-	retval = 1.0;
-	goto ret;
-      }
-
-    if (n >= badint)
-      {
-	if (n > infint)
+  if (hx < 0x3ff0a2b2)
+    {				/* |x| < 3/2 ln 2 */
+      if (hx < 0x3f862e42)
+	{			/* |x| < 1/64 ln 2 */
+	  if (hx < 0x3ed00000)
+	    {			/* |x| < 2^-18 */
+	      if (hx < 0x3e300000)
+		{
+		  {
+		    SET_RESTORE_ROUND (FE_TONEAREST);
+		    retval = one + xx.x;
+		  }
+		  return retval;
+		}
+	      {
+		SET_RESTORE_ROUND (FE_TONEAREST);
+		retval = one + xx.x * (one + half * xx.x);
+	      }
+	      return retval;
+	    }
 	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/* x is NaN */
-	if (n < infint)
-	  {
-	    if (x > 0)
-	      goto ret_huge;
-	    else
-	      goto ret_tiny;
+	    SET_RESTORE_ROUND (FE_TONEAREST);
+	    t = xx.x * xx.x;
+	    yy.y = xx.x + (t * (half + xx.x * t2)
+			   + (t * t) * (t3 + xx.x * t4 + t * t5));
+	    retval = one + yy.y;
 	  }
-	/* x is finite,  cause either overflow or underflow  */
-	if (junk1.i[LOW_HALF] != 0)
-	  {
-	    retval = x + x;
-	    goto ret;
-	  }			/*  x is NaN  */
-	retval = (x > 0) ? inf.x : zero;	/* |x| = inf;  return either inf or 0 */
-	goto ret;
-      }
+	  return retval;
+	}
 
-    y = x * log2e.x + three51.x;
-    bexp = y - three51.x;
-    junk1.x = y;
-    eps = bexp * ln_two2.x;
-    t = x - bexp * ln_two1.x;
-    y = t + three33.x;
-    base = y - three33.x;
-    junk2.x = y;
-    del = (t - base) - eps;
-    eps = del + del * del * (p3.x * del + p2.x);
-    i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
-    j = (junk2.i[LOW_HALF] & 511) << 1;
-    al = coar.x[i] * fine.x[j];
-    bet = ((coar.x[i] * fine.x[j + 1] + coar.x[i + 1] * fine.x[j])
-	   + coar.x[i + 1] * fine.x[j + 1]);
-    rem = (bet + bet * eps) + al * eps;
-    res = al + rem;
-    cor = (al - res) + rem;
-    if (m >> 31)
+      /* Find the multiple of 2^-6 nearest x.  */
+      k = hx >> 20;
+      j = (0x00100000 | (hx & 0x000fffff)) >> (0x40c - k);
+      j = (j - 1) & ~1;
+      if (ix < 0)
+	j += 134;
       {
-	ex = junk1.i[LOW_HALF];
-	if (res < 1.0)
-	  {
-	    res += res;
-	    cor += cor;
-	    ex -= 1;
-	  }
-	if (ex >= -1022)
-	  {
-	    binexp.i[HIGH_HALF] = (1023 + ex) << 20;
-	    /* Does not underflow: res >= 1.0, binexp >= 0x1p-1022
-	       Maximum relative error is 7.8e-22 (70.1 bits).
-	       Maximum ULP error is 0.500007.  */
-	    retval = res * binexp.x;
-	    goto ret;
-	  }
-	ex = -(1022 + ex);
-	binexp.i[HIGH_HALF] = (1023 - ex) << 20;
-	res *= binexp.x;
-	cor *= binexp.x;
-	t = 1.0 + res;
-	y = ((1.0 - t) + res) + cor;
-	res = t + y;
-	/* Maximum ULP error is 0.5000035.  */
-	binexp.i[HIGH_HALF] = 0x00100000;
-	retval = (res - 1.0) * binexp.x;
-	if (retval < DBL_MIN)
-	  {
-	    double force_underflow = tiny * tiny;
-	    math_force_eval (force_underflow);
-	  }
-	if (retval == 0)
-	  goto ret_tiny;
-	goto ret;
+	SET_RESTORE_ROUND (FE_TONEAREST);
+	z = xx.x - TBL2[j];
+	t = z * z;
+	yy.y = z + (t * (half + (z * t2))
+		    + (t * t) * (t3 + z * t4 + t * t5));
+	retval = TBL2[j + 1] + TBL2[j + 1] * yy.y;
       }
+      return retval;
+    }
+
+  if (hx >= 0x40862e42)
+    {				/* x is large, infinite, or nan.  */
+      if (hx >= 0x7ff00000)
+	{
+	  if (ix == 0xfff00000 && xx.i_part[LOW_HALF] == 0)
+	    return zero;	/* exp(-inf) = 0.  */
+	  return (xx.x * xx.x);	/* exp(nan/inf) is nan or inf.  */
+	}
+      if (xx.x > threshold1)
+	{			/* Set overflow error condition.  */
+	  retval = hhuge * hhuge;
+	  return retval;
+	}
+      if (-xx.x > threshold2)
+	{			/* Set underflow error condition.  */
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	  retval = force_underflow;
+	  return retval;
+	}
+    }
+
+  {
+    SET_RESTORE_ROUND (FE_TONEAREST);
+    t = invln2_64 * xx.x;
+    if (ix < 0)
+      t -= half;
     else
-      {
-	binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 767) << 20;
-	/* Maximum relative error is 7.8e-22 (70.1 bits).
-	   Maximum ULP error is 0.500007.  */
-	retval = res * binexp.x * t256.x;
-	if (isinf (retval))
-	  goto ret_huge;
-	else
-	  goto ret;
-      }
-  }
-ret:
-  return retval;
+      t += half;
+    k = (int) t;
+    j = (k & 0x3f) << 1;
+    m = k >> 6;
+    z = (xx.x - k * ln2_64hi) - k * ln2_64lo;
 
- ret_huge:
-  return hhuge * hhuge;
+    /* z is now in primary range.  */
+    t = z * z;
+    yy.y = z + (t * (half + z * t2) + (t * t) * (t3 + z * t4 + t * t5));
+    yy.y = TBL[j] + (TBL[j + 1] + TBL[j] * yy.y);
+  }
 
- ret_tiny:
-  return tiny * tiny;
+  if (m < -1021)
+    {
+      yy.y_part[HIGH_HALF] += (m + 54) << 20;
+      retval = twom54 * yy.y;
+      if (retval < DBL_MIN)
+	{
+	  double force_underflow = tiny * tiny;
+	  math_force_eval (force_underflow);
+	}
+      return retval;
+    }
+  yy.y_part[HIGH_HALF] += m << 20;
+  return yy.y;
 }
 #ifndef __ieee754_exp
 strong_alias (__ieee754_exp, __exp_finite)
diff --git a/sysdeps/ieee754/dbl-64/eexp.tbl b/sysdeps/ieee754/dbl-64/eexp.tbl
new file mode 100644
index 0000000..a4eef22
--- /dev/null
+++ b/sysdeps/ieee754/dbl-64/eexp.tbl
@@ -0,0 +1,255 @@
+/* EXP function tables - for use in computing double precision exponential
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+
+/*
+   TBL[2*j] is 2**(j/64), rounded to nearest.
+   TBL[2*j+1] is 2**(j/64) - TBL[2*j], rounded to nearest.
+   These values are used to approximate exp(x) using the formula
+   given in the comments for e_exp.c.  */
+
+static const double TBL[128] = {
+    0x1.0000000000000p+0,  0x0.0000000000000p+0,
+    0x1.02c9a3e778061p+0, -0x1.19083535b085dp-56,
+    0x1.059b0d3158574p+0,  0x1.d73e2a475b465p-55,
+    0x1.0874518759bc8p+0,  0x1.186be4bb284ffp-57,
+    0x1.0b5586cf9890fp+0,  0x1.8a62e4adc610bp-54,
+    0x1.0e3ec32d3d1a2p+0,  0x1.03a1727c57b52p-59,
+    0x1.11301d0125b51p+0, -0x1.6c51039449b3ap-54,
+    0x1.1429aaea92de0p+0, -0x1.32fbf9af1369ep-54,
+    0x1.172b83c7d517bp+0, -0x1.19041b9d78a76p-55,
+    0x1.1a35beb6fcb75p+0,  0x1.e5b4c7b4968e4p-55,
+    0x1.1d4873168b9aap+0,  0x1.e016e00a2643cp-54,
+    0x1.2063b88628cd6p+0,  0x1.dc775814a8495p-55,
+    0x1.2387a6e756238p+0,  0x1.9b07eb6c70573p-54,
+    0x1.26b4565e27cddp+0,  0x1.2bd339940e9d9p-55,
+    0x1.29e9df51fdee1p+0,  0x1.612e8afad1255p-55,
+    0x1.2d285a6e4030bp+0,  0x1.0024754db41d5p-54,
+    0x1.306fe0a31b715p+0,  0x1.6f46ad23182e4p-55,
+    0x1.33c08b26416ffp+0,  0x1.32721843659a6p-54,
+    0x1.371a7373aa9cbp+0, -0x1.63aeabf42eae2p-54,
+    0x1.3a7db34e59ff7p+0, -0x1.5e436d661f5e3p-56,
+    0x1.3dea64c123422p+0,  0x1.ada0911f09ebcp-55,
+    0x1.4160a21f72e2ap+0, -0x1.ef3691c309278p-58,
+    0x1.44e086061892dp+0,  0x1.89b7a04ef80d0p-59,
+    0x1.486a2b5c13cd0p+0,  0x1.3c1a3b69062f0p-56,
+    0x1.4bfdad5362a27p+0,  0x1.d4397afec42e2p-56,
+    0x1.4f9b2769d2ca7p+0, -0x1.4b309d25957e3p-54,
+    0x1.5342b569d4f82p+0, -0x1.07abe1db13cadp-55,
+    0x1.56f4736b527dap+0,  0x1.9bb2c011d93adp-54,
+    0x1.5ab07dd485429p+0,  0x1.6324c054647adp-54,
+    0x1.5e76f15ad2148p+0,  0x1.ba6f93080e65ep-54,
+    0x1.6247eb03a5585p+0, -0x1.383c17e40b497p-54,
+    0x1.6623882552225p+0, -0x1.bb60987591c34p-54,
+    0x1.6a09e667f3bcdp+0, -0x1.bdd3413b26456p-54,
+    0x1.6dfb23c651a2fp+0, -0x1.bbe3a683c88abp-57,
+    0x1.71f75e8ec5f74p+0, -0x1.16e4786887a99p-55,
+    0x1.75feb564267c9p+0, -0x1.0245957316dd3p-54,
+    0x1.7a11473eb0187p+0, -0x1.41577ee04992fp-55,
+    0x1.7e2f336cf4e62p+0,  0x1.05d02ba15797ep-56,
+    0x1.82589994cce13p+0, -0x1.d4c1dd41532d8p-54,
+    0x1.868d99b4492edp+0, -0x1.fc6f89bd4f6bap-54,
+    0x1.8ace5422aa0dbp+0,  0x1.6e9f156864b27p-54,
+    0x1.8f1ae99157736p+0,  0x1.5cc13a2e3976cp-55,
+    0x1.93737b0cdc5e5p+0, -0x1.75fc781b57ebcp-57,
+    0x1.97d829fde4e50p+0, -0x1.d185b7c1b85d1p-54,
+    0x1.9c49182a3f090p+0,  0x1.c7c46b071f2bep-56,
+    0x1.a0c667b5de565p+0, -0x1.359495d1cd533p-54,
+    0x1.a5503b23e255dp+0, -0x1.d2f6edb8d41e1p-54,
+    0x1.a9e6b5579fdbfp+0,  0x1.0fac90ef7fd31p-54,
+    0x1.ae89f995ad3adp+0,  0x1.7a1cd345dcc81p-54,
+    0x1.b33a2b84f15fbp+0, -0x1.2805e3084d708p-57,
+    0x1.b7f76f2fb5e47p+0, -0x1.5584f7e54ac3bp-56,
+    0x1.bcc1e904bc1d2p+0,  0x1.23dd07a2d9e84p-55,
+    0x1.c199bdd85529cp+0,  0x1.11065895048ddp-55,
+    0x1.c67f12e57d14bp+0,  0x1.2884dff483cadp-54,
+    0x1.cb720dcef9069p+0,  0x1.503cbd1e949dbp-56,
+    0x1.d072d4a07897cp+0, -0x1.cbc3743797a9cp-54,
+    0x1.d5818dcfba487p+0,  0x1.2ed02d75b3707p-55,
+    0x1.da9e603db3285p+0,  0x1.c2300696db532p-54,
+    0x1.dfc97337b9b5fp+0, -0x1.1a5cd4f184b5cp-54,
+    0x1.e502ee78b3ff6p+0,  0x1.39e8980a9cc8fp-55,
+    0x1.ea4afa2a490dap+0, -0x1.e9c23179c2893p-54,
+    0x1.efa1bee615a27p+0,  0x1.dc7f486a4b6b0p-54,
+    0x1.f50765b6e4540p+0,  0x1.9d3e12dd8a18bp-54,
+    0x1.fa7c1819e90d8p+0,  0x1.74853f3a5931ep-55};
+
+/* For i = 0, ..., 66,
+     TBL2[2*i] is a double precision number near (i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.
+
+   For i = 67, ..., 133,
+     TBL2[2*i] is a double precision number near -(i+1)*2^-6, and
+     TBL2[2*i+1] = exp(TBL2[2*i]) to within a relative error less
+     than 2^-60.  */
+
+static const double TBL2[268] = {
+    0x1.ffffffffffc82p-7,   0x1.04080ab55de32p+0,
+    0x1.fffffffffffdbp-6,   0x1.08205601127ecp+0,
+    0x1.80000000000a0p-5,   0x1.0c49236829e91p+0,
+    0x1.fffffffffff79p-5,   0x1.1082b577d34e9p+0,
+    0x1.3fffffffffffcp-4,   0x1.14cd4fc989cd6p+0,
+    0x1.8000000000060p-4,   0x1.192937074e0d4p+0,
+    0x1.c000000000061p-4,   0x1.1d96b0eff0e80p+0,
+    0x1.fffffffffffd6p-4,   0x1.2216045b6f5cap+0,
+    0x1.1ffffffffff58p-3,   0x1.26a7793f6014cp+0,
+    0x1.3ffffffffff75p-3,   0x1.2b4b58b372c65p+0,
+    0x1.5ffffffffff00p-3,   0x1.3001ecf601ad1p+0,
+    0x1.8000000000020p-3,   0x1.34cb8170b583ap+0,
+    0x1.9ffffffffa629p-3,   0x1.39a862bd3b344p+0,
+    0x1.c00000000000fp-3,   0x1.3e98deaa11dcep+0,
+    0x1.e00000000007fp-3,   0x1.439d443f5f16dp+0,
+    0x1.0000000000072p-2,   0x1.48b5e3c3e81abp+0,
+    0x1.0fffffffffecap-2,   0x1.4de30ec211dfbp+0,
+    0x1.1ffffffffff8fp-2,   0x1.5325180cfacd2p+0,
+    0x1.300000000003bp-2,   0x1.587c53c5a7b04p+0,
+    0x1.4000000000034p-2,   0x1.5de9176046007p+0,
+    0x1.4ffffffffff89p-2,   0x1.636bb9a98322fp+0,
+    0x1.5ffffffffffe7p-2,   0x1.690492cbf942ap+0,
+    0x1.6ffffffffff78p-2,   0x1.6eb3fc55b1e45p+0,
+    0x1.7ffffffffff65p-2,   0x1.747a513dbef32p+0,
+    0x1.8ffffffffffd5p-2,   0x1.7a57ede9ea22ep+0,
+    0x1.9ffffffffff6ep-2,   0x1.804d30347b50fp+0,
+    0x1.affffffffffc3p-2,   0x1.865a7772164aep+0,
+    0x1.c000000000053p-2,   0x1.8c802477b0030p+0,
+    0x1.d00000000004dp-2,   0x1.92be99a09bf1ep+0,
+    0x1.e000000000096p-2,   0x1.99163ad4b1e08p+0,
+    0x1.efffffffffefap-2,   0x1.9f876d8e8c4fcp+0,
+    0x1.fffffffffffd0p-2,   0x1.a61298e1e0688p+0,
+    0x1.0800000000002p-1,   0x1.acb82581eee56p+0,
+    0x1.100000000001fp-1,   0x1.b3787dc80f979p+0,
+    0x1.17ffffffffff8p-1,   0x1.ba540dba56e4fp+0,
+    0x1.1fffffffffffap-1,   0x1.c14b431256441p+0,
+    0x1.27fffffffffc4p-1,   0x1.c85e8d43f7c9bp+0,
+    0x1.2fffffffffffdp-1,   0x1.cf8e5d84758a6p+0,
+    0x1.380000000001fp-1,   0x1.d6db26d16cd84p+0,
+    0x1.3ffffffffffd8p-1,   0x1.de455df80e39bp+0,
+    0x1.4800000000052p-1,   0x1.e5cd799c6a59cp+0,
+    0x1.4ffffffffffc8p-1,   0x1.ed73f240dc10cp+0,
+    0x1.5800000000013p-1,   0x1.f539424d90f71p+0,
+    0x1.5ffffffffffbcp-1,   0x1.fd1de6182f885p+0,
+    0x1.680000000002dp-1,   0x1.02912df5ce741p+1,
+    0x1.7000000000040p-1,   0x1.06a39207f0a2ap+1,
+    0x1.780000000004fp-1,   0x1.0ac660691652ap+1,
+    0x1.7ffffffffff6fp-1,   0x1.0ef9db467dcabp+1,
+    0x1.87fffffffffe5p-1,   0x1.133e45d82e943p+1,
+    0x1.9000000000035p-1,   0x1.1793e4652cc6dp+1,
+    0x1.97fffffffffb3p-1,   0x1.1bfafc47bda48p+1,
+    0x1.a000000000000p-1,   0x1.2073d3f1bd518p+1,
+    0x1.a80000000004ap-1,   0x1.24feb2f105ce2p+1,
+    0x1.affffffffffedp-1,   0x1.299be1f3e7f11p+1,
+    0x1.b7ffffffffffbp-1,   0x1.2e4baacdb6611p+1,
+    0x1.c00000000001dp-1,   0x1.330e587b62b39p+1,
+    0x1.c800000000079p-1,   0x1.37e437282d538p+1,
+    0x1.cffffffffff51p-1,   0x1.3ccd943268248p+1,
+    0x1.d7fffffffff74p-1,   0x1.41cabe304cadcp+1,
+    0x1.e000000000011p-1,   0x1.46dc04f4e5343p+1,
+    0x1.e80000000001ep-1,   0x1.4c01b9950a124p+1,
+    0x1.effffffffff9ep-1,   0x1.513c2e6c73196p+1,
+    0x1.f7fffffffffedp-1,   0x1.568bb722dd586p+1,
+    0x1.0000000000034p+0,   0x1.5bf0a8b1457b0p+1,
+    0x1.03fffffffffe2p+0,   0x1.616b5967376dfp+1,
+    0x1.07fffffffff4bp+0,   0x1.66fc20f0337a9p+1,
+    0x1.0bffffffffffdp+0,   0x1.6ca35859290f5p+1,
+   -0x1.fffffffffffe4p-7,   0x1.f80feabfeefa5p-1,
+   -0x1.ffffffffffb0bp-6,   0x1.f03f56a88b5fep-1,
+   -0x1.7ffffffffffa7p-5,   0x1.e88dc6afecfc5p-1,
+   -0x1.ffffffffffea8p-5,   0x1.e0fabfbc702b8p-1,
+   -0x1.3ffffffffffb3p-4,   0x1.d985c89d041acp-1,
+   -0x1.7ffffffffffe3p-4,   0x1.d22e6a0197c06p-1,
+   -0x1.bffffffffff9ap-4,   0x1.caf42e73a4c89p-1,
+   -0x1.fffffffffff98p-4,   0x1.c3d6a24ed822dp-1,
+   -0x1.1ffffffffffe9p-3,   0x1.bcd553b9d7b67p-1,
+   -0x1.3ffffffffffe0p-3,   0x1.b5efd29f24c2dp-1,
+   -0x1.5fffffffff553p-3,   0x1.af25b0a61a9f4p-1,
+   -0x1.7ffffffffff8bp-3,   0x1.a876812c08794p-1,
+   -0x1.9fffffffffe51p-3,   0x1.a1e1d93d68828p-1,
+   -0x1.bffffffffff6ep-3,   0x1.9b674f8f2f3f5p-1,
+   -0x1.dffffffffff7fp-3,   0x1.95067c7837a0cp-1,
+   -0x1.fffffffffff7ap-3,   0x1.8ebef9eac8225p-1,
+   -0x1.0fffffffffffep-2,   0x1.8890636e31f55p-1,
+   -0x1.1ffffffffff41p-2,   0x1.827a56188975ep-1,
+   -0x1.2ffffffffffbap-2,   0x1.7c7c708877656p-1,
+   -0x1.3fffffffffff8p-2,   0x1.769652df22f81p-1,
+   -0x1.4ffffffffff90p-2,   0x1.70c79eba33c2fp-1,
+   -0x1.5ffffffffffdbp-2,   0x1.6b0ff72deb8aap-1,
+   -0x1.6ffffffffff9ap-2,   0x1.656f00bf5798ep-1,
+   -0x1.7ffffffffff9fp-2,   0x1.5fe4615e98eb0p-1,
+   -0x1.8ffffffffffeep-2,   0x1.5a6fc061433cep-1,
+   -0x1.9fffffffffc4ap-2,   0x1.5510c67cd26cdp-1,
+   -0x1.affffffffff30p-2,   0x1.4fc71dc13566bp-1,
+   -0x1.bfffffffffff0p-2,   0x1.4a9271936fd0ep-1,
+   -0x1.cfffffffffff3p-2,   0x1.45726ea84fb8cp-1,
+   -0x1.dfffffffffff3p-2,   0x1.4066c2ff3912bp-1,
+   -0x1.effffffffff80p-2,   0x1.3b6f1ddd05ab9p-1,
+   -0x1.fffffffffffdfp-2,   0x1.368b2fc6f9614p-1,
+   -0x1.0800000000000p-1,   0x1.31baaa7dca843p-1,
+   -0x1.0ffffffffffa4p-1,   0x1.2cfd40f8bdce4p-1,
+   -0x1.17fffffffff0ap-1,   0x1.2852a760d5ce7p-1,
+   -0x1.2000000000000p-1,   0x1.23ba930c1568bp-1,
+   -0x1.27fffffffffbbp-1,   0x1.1f34ba78d568dp-1,
+   -0x1.2fffffffffe32p-1,   0x1.1ac0d5492c1dbp-1,
+   -0x1.37ffffffff042p-1,   0x1.165e9c3e67ef2p-1,
+   -0x1.3ffffffffff77p-1,   0x1.120dc93499431p-1,
+   -0x1.47fffffffff6bp-1,   0x1.0dce171e34ecep-1,
+   -0x1.4fffffffffff1p-1,   0x1.099f41ffbe588p-1,
+   -0x1.57ffffffffe02p-1,   0x1.058106eb8a7aep-1,
+   -0x1.5ffffffffffe5p-1,   0x1.017323fd9002ep-1,
+   -0x1.67fffffffffb0p-1,   0x1.faeab0ae9386cp-2,
+   -0x1.6ffffffffffb2p-1,   0x1.f30ec837503d7p-2,
+   -0x1.77fffffffff7fp-1,   0x1.eb5210d627133p-2,
+   -0x1.7ffffffffffe8p-1,   0x1.e3b40ebefcd95p-2,
+   -0x1.87fffffffffc8p-1,   0x1.dc3448110dae2p-2,
+   -0x1.8fffffffffb30p-1,   0x1.d4d244cf4ef06p-2,
+   -0x1.97fffffffffefp-1,   0x1.cd8d8ed8ee395p-2,
+   -0x1.9ffffffffffa7p-1,   0x1.c665b1e1f1e5cp-2,
+   -0x1.a7fffffffffdcp-1,   0x1.bf5a3b6bf18d6p-2,
+   -0x1.affffffffff95p-1,   0x1.b86ababeef93bp-2,
+   -0x1.b7fffffffffcbp-1,   0x1.b196c0e24d256p-2,
+   -0x1.bffffffffff32p-1,   0x1.aadde095dadf7p-2,
+   -0x1.c7fffffffff6ap-1,   0x1.a43fae4b047c9p-2,
+   -0x1.cffffffffffb6p-1,   0x1.9dbbc01e182a4p-2,
+   -0x1.d7fffffffffcap-1,   0x1.9751adcfa81ecp-2,
+   -0x1.dffffffffffcdp-1,   0x1.910110be0699ep-2,
+   -0x1.e7ffffffffffbp-1,   0x1.8ac983dedbc69p-2,
+   -0x1.effffffffff88p-1,   0x1.84aaa3b8d51a9p-2,
+   -0x1.f7fffffffffbbp-1,   0x1.7ea40e5d6d92ep-2,
+   -0x1.fffffffffffdbp-1,   0x1.78b56362cef53p-2,
+   -0x1.03fffffffff00p+0,   0x1.72de43ddcb1f2p-2,
+   -0x1.07ffffffffe6fp+0,   0x1.6d1e525bed085p-2,
+   -0x1.0bfffffffffd6p+0,   0x1.677532dda1c57p-2};
+
+static const double
+/* invln2_64 = 64/ln2 - used to scale x to primary range. */
+  invln2_64 = 0x1.71547652b82fep+6,
+/* ln2_64hi = high 32 bits of log(2.)/64. */
+  ln2_64hi = 0x1.62e42fee00000p-7, 
+/* ln2_64lo = remainder bits for log(2.)/64 - ln2_64hi. */
+  ln2_64lo = 0x1.a39ef35793c76p-39,
+/* t2-t5 terms used for polynomial computation.  */
+  t2 = 0x1.5555555555555p-3, /* 1.6666666666526086527e-1 */
+  t3 = 0x1.5555555555555p-5, /* 4.1666666666226079285e-2 */
+  t4 = 0x1.1111111111111p-7, /* 8.3333679843421958056e-3 */
+  t5 = 0x1.6c16c16c16c17p-10, /* 1.3888949086377719040e-3 */
+/* Maximum value for x to not overflow.  */
+  threshold1 = 0x1.62e42fefa39efp+9, /* 7.09782712893383973096e+02 */
+/* Maximum value for -x to not underflow to zero in FE_TONEAREST mode.  */
+  threshold2 = 0x1.74910d52d3051p+9, /* 7.45133219101941108420e+02 */
+/* Scaling factor used when result near zero.  */
+  twom54 = 0x1.0000000000000p-54; /* 5.55111512312578270212e-17 */
diff --git a/sysdeps/sparc/fpu/libm-test-ulps b/sysdeps/sparc/fpu/libm-test-ulps
index b2fe15d..34a1e90 100644
--- a/sysdeps/sparc/fpu/libm-test-ulps
+++ b/sysdeps/sparc/fpu/libm-test-ulps
@@ -1508,7 +1508,9 @@ ildouble: 5
 ldouble: 5
 
 Function: "exp":
+double: 1
 float: 1
+idouble: 1
 ifloat: 1
 ildouble: 1
 ldouble: 1
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 48e53f7..c82cd4f 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1902,7 +1902,9 @@ ildouble: 5
 ldouble: 5
 
 Function: "exp":
+double: 1
 float128: 1
+idouble: 1
 ifloat128: 1
 ildouble: 1
 ldouble: 1
-- 
1.7.1
Follow-Ups:
- Re: [PATCH] v12 Improves __ieee754_exp() performance by 6-11% on aarch64/sparc/x86.
  - From: Szabolcs Nagy
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]