This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] Don't spin around multiplying zeroes in __mul


Hi,

The current implementation of __mul unconditionally multiplies X[i]
and Y[j] regardless of whether they're zero.  The sequence of X[i] *
Y[j] is such that i increments up to the initial value of j and j
decrements to the initial value of i.  Hence, if both X and Y have
only zeroes after IP digits, the terms at the extreme ends,
i.e. 1..ip-1 and ip+1..p are always zero (P being the precision we're
calculating in).  It is possible to skip those terms altogether and
just compute the non-zero terms.

Attached patch only bothers performing multiplication of mantissa
digits up to the point where either X or Y have non-zero digits and
skips the rest.  I have verified that this passes the testsuite on
x86_64.  The patch applies on top of the earlier patch that added the
zk variable.

Performance improvement:

For 100000 iterations of pow with input as (1.0000000000000020, 1.5):

Without the patch:

Total:42972814009, Fastest:416278, Slowest:1242152, Avg:429728.140090

With the patch:

Total:29527884109, Fastest:285682, Slowest:842066, Avg:295278.841090

That means an improvement in fastest run by ~31.4% and average
performance by ~31.3%!


Siddhesh

	* sysdeps/ieee754/dbl-64/mpa.c (__mul): Don't bother with zero
	values in the mantissa.

diff --git a/sysdeps/ieee754/dbl-64/mpa.c b/sysdeps/ieee754/dbl-64/mpa.c
index 98a9b36..68f5240 100644
--- a/sysdeps/ieee754/dbl-64/mpa.c
+++ b/sysdeps/ieee754/dbl-64/mpa.c
@@ -383,7 +383,7 @@ void
 SECTION
 __mul(const mp_no *x, const mp_no *y, mp_no *z, int p) {
 
-  int i, j, k, k2;
+  int i, j, k, k2, ip = p;
   double u, zk;
 
   /* Is z=0?  */
@@ -393,13 +393,21 @@ __mul(const mp_no *x, const mp_no *y, mp_no *z, int p) {
       return;
     }
 
+  /* We need not iterate through all X's and Y's since it's pointless to
+     multiply zeroes.  */
+  for (i = p; i > 0; i--)
+    if (X[i] == 0 && Y[i] == 0)
+      ip--;
+    else
+      break;
+
   /* Multiply, add and carry.  */
   k2 = (__glibc_unlikely (p < 3)) ? p + p : p + 3;
   zk = Z[k2] = ZERO;
 
   for (k = k2; k > p; k--)
     {
-      for (i = k - p, j = p; i < p + 1; i++, j--)
+      for (i = k - ip, j = ip; i < ip + 1; i++, j--)
 	zk += X[i] * Y[j];
 
       u = (zk + CUTTER) - CUTTER;
@@ -408,6 +416,8 @@ __mul(const mp_no *x, const mp_no *y, mp_no *z, int p) {
       Z[k] = zk - u;
       zk = u * RADIXI;
     }
+  Z[k] = zk;
+  k = MIN(k, 2 * ip);
 
   while (k > 1)
     {


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]