Bug 602

Summary: powerpc rint() function is buggy in the rounding toward -inf and +inf modes
Product: glibc Reporter: Vincent Lefèvre <vincent-srcware>
Component: mathAssignee: Andreas Jaeger <aj>
Status: RESOLVED FIXED    
Severity: normal CC: glibc-bugs, sjmunroe
Priority: P2 Flags: fweimer: security-
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Bug Depends on:    
Bug Blocks: 724    
Attachments: [PATCH] for rounding functions
[PATCH] Updated with test cases

Description Vincent Lefèvre 2004-12-08 16:33:10 UTC
The rint function for the powerpc gives wrong values for negative arguments in
the rounding toward -inf and +inf modes, and probably an incorrect sign in the
rounding to -inf mode for values between 0 and 1 (-0.0 instead of +0.0). The
(generic) nearbyint function is correct however.

For instance, rint(-0.5) gives 0.0 instead of -1.0 in the rounding toward -inf mode.

sysdeps/powerpc/fpu/s_rint.c (from CVS) currently contains:

  if (fabs (x) < TWO52)
    {
      if (x > 0.0)
        {
          x += TWO52;
          x -= TWO52;
        }
      else if (x < 0.0)
        {
          x = TWO52 - x;
          x = -(x - TWO52);
        }
    }

I suggest the following code:

  if (fabs (x) < TWO52)
    {
      if (x > 0.0)
        {
          x += TWO52;
          x -= TWO52;
          if (x == 0.0)
            x = 0.0;
        }
      else if (x < 0.0)
        {
          x -= TWO52;
          x += TWO52;
          if (x == 0.0)
            x = -0.0;
        }
    }

The tests with 0.0 correct the sign of the result (I don't know if this is
standardized, but this is more consistent like that).

I have a testing program and the above implementation here:

    http://www.vinc17.org/software/nearestint.c

For instance, try: ./nearestint 4.5 1

Note: I reported this bug (with the proposed patch) to the Debian BTS more than
a year ago[*], but it received no comments.

[*] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=216800
Comment 1 Vincent Lefèvre 2004-12-15 00:37:34 UTC
Some additional information... Here's a part of the output of my program (on a
PowerPC machine):

ay:~/wd/src/fp> ./nearestint 4.5 1
[...]
Rounding toward -oo
             -4.5  -3.5  -2.5  -1.5  -0.5   0.5   1.5   2.5   3.5   4.5
casttoint    -4    -3    -2    -1     0     0     1     2     3     4  
trunc        -4.0  -3.0  -2.0  -1.0  -0.0   0.0   1.0   2.0   3.0   4.0
floor        -5.0  -4.0  -3.0  -2.0  -1.0   0.0   1.0   2.0   3.0   4.0
ceil         -4.0  -3.0  -2.0  -1.0  -0.0   1.0   2.0   3.0   4.0   5.0
round        -5.0  -4.0  -3.0  -2.0  -1.0   1.0   2.0   3.0   4.0   5.0
nearbyint    -5.0  -4.0  -3.0  -2.0  -1.0   0.0   1.0   2.0   3.0   4.0
myrint       -5.0  -4.0  -3.0  -2.0  -1.0   0.0   1.0   2.0   3.0   4.0
rint         -4.0  -3.0  -2.0  -1.0   0.0  -0.0   1.0   2.0   3.0   4.0
[...]

"myrint" corresponds to my code. As expected, it agrees with the nearbyint glibc
function (which is correct). In short, the current rint code for negative
numbers doesn't work in the rounding modes toward -oo and toward +oo, because it
does the first subtraction in the wrong way, so that the result is rounded in
the wrong direction.

Note: do not try the code on Linux/x86, or any other processor that runs in
extended precision; it will not work (in part due to a gcc bug). Well, if you
really want to try on such a machine, you can still use the -ffloat-store
compiler switch, but the results are not guaranteed for any input.
Comment 2 Andreas Jaeger 2004-12-18 15:24:48 UTC
Steven, could you look at this bug, please?
Comment 3 Steven Munroe 2004-12-20 22:55:53 UTC
Fixed in cvs since June 11 2004. The C implementation mentioned above was
replace With optimized asm implementations:

2004-06-11  Dwayne Grant McConnell  <dgm69@us.ibm.com>

	* sysdeps/powerpc/fpu/s_lround.c: Removed.
	* sysdeps/powerpc/fpu/s_lroundf.c: Removed.
	* sysdeps/powerpc/powerpc32/fpu/s_ceilf.S: New file.
	* sysdeps/powerpc/powerpc32/fpu/s_ceil.S: New file.
	* sysdeps/powerpc/powerpc32/fpu/s_floorf.S: New file.
	* sysdeps/powerpc/powerpc32/fpu/s_floor.S: New file.
	* sysdeps/powerpc/powerpc32/fpu/s_lrint.c: Removed.
	* sysdeps/powerpc/powerpc32/fpu/s_lrint.S: New file.
	* sysdeps/powerpc/powerpc32/fpu/s_lroundf.S: New file.
	* sysdeps/powerpc/powerpc32/fpu/s_lround.S: New file.
	* sysdeps/powerpc/powerpc32/fpu/s_rintf.S: New file.
	* sysdeps/powerpc/powerpc32/fpu/s_rint.S: New file.
	* sysdeps/powerpc/powerpc32/fpu/s_roundf.S: New file.
	* sysdeps/powerpc/powerpc32/fpu/s_round.S: New file.
	* sysdeps/powerpc/powerpc32/fpu/s_truncf.S: New file.
	* sysdeps/powerpc/powerpc32/fpu/s_trunc.S: New file.

looks like the sysdeps/powerpc/fpu/s_rint.c is redundant and should be removed.
The equivalent PPC64 changes when in April 29 2004. 

Anyway on SLES9 and RHEL4 we get:


Rounding toward -oo
             -3.0  -2.0  -1.0  -0.0   0.0   1.0   2.0   3.0
casttoint    -3    -2    -1     0     0     1     2     3  
trunc        -3.0  -2.0  -1.0  -0.0   0.0   1.0   2.0   3.0
floor        -3.0  -2.0  -1.0  -0.0   0.0   1.0   2.0   3.0
ceil         -3.0  -2.0  -1.0  -0.0   0.0   1.0   2.0   3.0
round        -3.0  -2.0  -1.0  -0.0   0.0   1.0   2.0   3.0
nearbyint    -3.0  -2.0  -1.0  -0.0   0.0   1.0   2.0   3.0
myrint       -3.0  -2.0  -1.0  -0.0   0.0   1.0   2.0   3.0
rint         -3.0  -2.0  -1.0  -0.0   0.0   1.0   2.0   3.0

However it look like there is another problem with fractional values that round
to zero:

Rounding toward -oo
             -1.5  -1.0  -0.5  -0.0   0.0   0.5   1.0   1.5
casttoint    -1    -1     0     0     0     0     1     1  
trunc        -1.0  -1.0  -0.0  -0.0   0.0   0.0   1.0   1.0
floor        -2.0  -1.0  -1.0  -0.0   0.0   0.0   1.0   1.0
ceil         -1.0  -1.0  -0.0  -0.0   0.0   1.0   1.0   2.0
round        -2.0  -1.0  -1.0  -0.0   0.0   1.0   1.0   2.0
nearbyint    -2.0  -1.0  -1.0  -0.0   0.0   0.0   1.0   1.0
myrint       -2.0  -1.0  -1.0  -0.0   0.0   0.0   1.0   1.0
rint         -1.0  -1.0   0.0  -0.0   0.0  -0.0   1.0   1.0

I'll look into this tommorrow.
Comment 4 Steven Munroe 2004-12-22 21:24:29 UTC
Created attachment 320 [details]
[PATCH] for rounding functions
Comment 5 Steven Munroe 2004-12-22 21:27:30 UTC
Ok, I found the problem in rint and floor and realized that it would be
simpler/faster to force the sign bit to the correct value (using fasb/fnabs
instructions) then to compare/branch for x == 0.0 case. The fabs/fnabs has a
latency of 6 cycles, while the fcmpu/bc has a latency of 8+2.

This simplies the the code and allows the ellimination of the -0.0 constant used
in some of the rounding functions. Finally I noticed that some of the float
(i.e. roundf, rintf, ...) functions were still using double constant values. so
I decides to clean up the whole of rounding functions (ceil, floor, rint, round,
trunc) for float and double. 

This gives a nice performance boost for POWER4, 970 (AKA G5) and POWER5
(10-135%). But I noticed anomalies in the round and trunc results. This depends
on the final alignment of the code and in some cases changes the dispatch
groupings resulting in branch miss-predictions. So for PPC64 I replace the ENTER
macro with EALIGN (?, 4, 0) which force quadword alignment. This improved the
results. I did not do this for powerpc32 because I don't have access to
powerpc32 machines to test on.
Comment 6 Steven Munroe 2004-12-22 21:31:14 UTC
Vincent doe this patch resolve your concerns?
Comment 7 Jakub Jelinek 2004-12-22 21:33:48 UTC
Can you please also write a testcase for the things current CVS ppc rint computes
incorrectly, so that we don't regress and make sure all platforms behave the
same?
Comment 8 Steven Munroe 2004-12-23 00:21:17 UTC
Created attachment 321 [details]
[PATCH] Updated with test cases

This patch adds 4 fesetround() variant test cases to math/libm-test.inc so it
will cover all platforms. 

Also fixed a typo in powerpc32/s_rintf.S that was causing test failures.
Comment 9 Vincent Lefèvre 2004-12-23 15:52:56 UTC
Re comment #6, unfortunately I can't test on my machine (I have a binary Debian
package, and I don't want to try a patch + recompilation now, as I won't be able
to recover if something goes really wrong). I assume that a test for the various
main cases like what my nearestint.c does is sufficient.
Comment 10 Steven Munroe 2005-01-05 20:53:03 UTC
I have verified that the additional tests add to math/libm-test.inc (from 
patch attachment id=321) reports the error in rint with the current CVS for 
powerpc32 and powerpc64. These errors are resolved by applying the remainder 
of patch from attachment id=321.

This should answer Jakub's requestion from comment #7 and address the testing 
requested in comments # 8,9.

Can we move this to FIXED and get the patch committed to CVS?
Comment 11 Ulrich Drepper 2005-01-06 21:58:12 UTC
Applied to cvs.
Comment 12 Sourceware Commits 2005-02-16 10:07:25 UTC
Subject: Bug 602

CVSROOT:	/cvs/glibc
Module name:	libc
Branch: 	glibc-2_3-branch
Changes by:	roland@sources.redhat.com	2005-02-16 10:07:17

Modified files:
	sysdeps/powerpc/powerpc32/fpu: s_ceil.S s_roundf.S s_trunc.S 
	                               s_rint.S s_floorf.S s_ceilf.S 
	                               s_floor.S s_truncf.S s_round.S 
	                               s_rintf.S 
	sysdeps/powerpc/powerpc64/fpu: s_trunc.S s_floorf.S s_floor.S 
	                               s_ceilf.S s_truncf.S s_ceil.S 
	                               s_rintf.S s_roundf.S s_round.S 
	                               s_rint.S 
	math           : libm-test.inc 

Log message:
	2004-12-22  Steven Munroe  <sjmunroe@us.ibm.com>
	
	[BZ #602]
	* math/libm-test.inc (rint_test_tonearest): New test.
	(rint_test_towardzero): New test.
	(rint_test_downward): New test.
	(rint_test_upward): New test.
	* sysdeps/powerpc/powerpc32/fpu/s_ceil.S: Fix -0.0 case.
	Remove redundant const values.
	* sysdeps/powerpc/powerpc32/fpu/s_ceilf.S: Fix -0.0 case.
	Remove redundant const values.  Use float const.
	* sysdeps/powerpc/powerpc32/fpu/s_floor.S: Fix -0.0 case.
	* sysdeps/powerpc/powerpc32/fpu/s_floorf.S: Fix -0.0 case.
	Use float const.
	* sysdeps/powerpc/powerpc32/fpu/s_rint.S: Fix -0.0 case.
	* sysdeps/powerpc/powerpc32/fpu/s_rintf.S: Fix -0.0 case.
	Use float const.
	* sysdeps/powerpc/powerpc32/fpu/s_round.S: Fix -0.0 case.
	Remove redundant const values.
	* sysdeps/powerpc/powerpc32/fpu/s_roundf.S: Fix -0.0 case.
	Remove redundant const values.  Use float const.
	* sysdeps/powerpc/powerpc32/fpu/s_trunc.S: Fix -0.0 case.
	Remove redundant const values.
	* sysdeps/powerpc/powerpc32/fpu/s_truncf.S: Fix -0.0 case.
	Remove redundant const values.  Use float const.
	* sysdeps/powerpc/powerpc64/fpu/s_ceil.S: Use EALIGN for Quadword
	alignment.  Fix -0.0 case.  Remove redundant const values.
	* sysdeps/powerpc/powerpc64/fpu/s_ceilf.S: Use EALIGN for Quadword
	alignment.  Fix -0.0 case.  Remove redundant const values.
	Use float const.
	* sysdeps/powerpc/powerpc64/fpu/s_floor.S: Use EALIGN for Quadword
	alignment.  Fix -0.0 case.
	* sysdeps/powerpc/powerpc64/fpu/s_floorf.S: Use EALIGN for Quadword
	alignment.  Fix -0.0 case.  Use float const.
	* sysdeps/powerpc/powerpc64/fpu/s_rint.S: Use EALIGN for Quadword
	alignment.  Fix -0.0 case.
	* sysdeps/powerpc/powerpc64/fpu/s_rintf.S: Use EALIGN for Quadword
	alignment.  Fix -0.0 case.  Use float const.
	* sysdeps/powerpc/powerpc64/fpu/s_round.S: Use EALIGN for Quadword
	alignment.  Fix -0.0 case.  Remove redundant const values.
	* sysdeps/powerpc/powerpc64/fpu/s_roundf.S: Use EALIGN for Quadword
	alignment.  Fix -0.0 case.  Remove redundant const values.
	Use float const.
	* sysdeps/powerpc/powerpc64/fpu/s_trunc.S: Use EALIGN for Quadword
	alignment.  Fix -0.0 case.
	* sysdeps/powerpc/powerpc64/fpu/s_truncf.S: Use EALIGN for Quadword
	alignment.  Fix -0.0 case.  Remove redundant const values.
	Use float const.

Patches:
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc32/fpu/s_ceil.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.1&r2=1.1.4.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc32/fpu/s_roundf.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.1&r2=1.1.4.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc32/fpu/s_trunc.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.1&r2=1.1.4.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc32/fpu/s_rint.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.1&r2=1.1.4.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc32/fpu/s_floorf.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.1&r2=1.1.4.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc32/fpu/s_ceilf.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.1&r2=1.1.4.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc32/fpu/s_floor.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.1&r2=1.1.4.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc32/fpu/s_truncf.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.1&r2=1.1.4.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc32/fpu/s_round.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.1&r2=1.1.4.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc32/fpu/s_rintf.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.1&r2=1.1.4.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc64/fpu/s_trunc.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.2&r2=1.2.2.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc64/fpu/s_floorf.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.2&r2=1.2.2.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc64/fpu/s_floor.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.2&r2=1.2.2.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc64/fpu/s_ceilf.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.2&r2=1.2.2.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc64/fpu/s_truncf.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.2&r2=1.2.2.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc64/fpu/s_ceil.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.2&r2=1.2.2.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc64/fpu/s_rintf.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.2&r2=1.2.2.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc64/fpu/s_roundf.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.2&r2=1.2.2.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc64/fpu/s_round.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.2&r2=1.2.2.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/sysdeps/powerpc/powerpc64/fpu/s_rint.S.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.2&r2=1.2.2.1
http://sources.redhat.com/cgi-bin/cvsweb.cgi/libc/math/libm-test.inc.diff?cvsroot=glibc&only_with_tag=glibc-2_3-branch&r1=1.64&r2=1.64.2.1