5781 – Slow dbl-64 sin/cos/sincos for special values

Bug 5781 - Slow dbl-64 sin/cos/sincos for special values

Summary: Slow dbl-64 sin/cos/sincos for special values

Status:	RESOLVED FIXED

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	math (show other bugs)
Version:	unspecified

Importance:	P2 enhancement
Target Milestone:	2.34
Assignee:	Siddhesh Poyarekar

URL:
Keywords:

Duplicates (2):	14412 16531 (view as bug list)
Depends on:
Blocks:

Reported:	2008-02-21 09:10 UTC by Petr Cervenka
Modified:	2021-03-15 03:00 UTC (History)
CC List:	7 users (show)

See Also:
Host:	x86_64-unknown-linux-gnu
Target:
Build:
Last reconfirmed:

Flags:	fweimer: security-

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Petr Cervenka 2008-02-21 09:10:38 UTC

I would like to repost my previously deleted bug by (lazy IMHO)
carlos@codesoucery.com. The math sin function is at least 1000x slower on 64bit
distributions for special numbers (and carlos dosn't care about it). 
I can't try it with CVS head, because I cannot connect to cvs through our firewall.
But even when I tried the latest snapshot, I couldn't build it (maybe another bug):
a - elf/dl-vdso.os
: /home/inova/projects/glibc/build/libc_pic.a
gcc   -nostdlib -nostartfiles -r -o
/home/inova/projects/glibc/build/elf/librtld.map.o '-Wl,-('
/home/inova/projects/glibc/build/elf/dl-allobjs.os
/home/inova/projects/glibc/build/libc_pic.a -lgcc '-Wl,-)'
-Wl,-Map,/home/inova/projects/glibc/build/elf/librtld.mapT
/home/inova/projects/glibc/build/libc_pic.a(init-first.os):(.data+0x0): multiple
definition of `__libc_multiple_libcs'
/home/inova/projects/glibc/build/elf/dl-allobjs.os:/home/inova/projects/glibc/src/glibc-20080218/elf/rtld.c:641:
first defined here
/home/inova/projects/glibc/build/libc_pic.a(dl-addr.os): In function
`_dl_addr_inside_object':
/home/inova/projects/glibc/src/glibc-20080218/elf/dl-addr.c:158: multiple
definition of `_dl_addr_inside_object'
/home/inova/projects/glibc/build/elf/dl-allobjs.os:/home/inova/projects/glibc/src/glibc-20080218/elf/dl-open.c:700:
first defined here
collect2: ld returned 1 exit status
make[2]: *** [/home/inova/projects/glibc/build/elf/librtld.map] Error 1
make[2]: Leaving directory `/home/inova/projects/glibc/src/glibc-20080218/elf'
make[1]: *** [elf/subdir_lib] Error 2
make[1]: Leaving directory `/home/inova/projects/glibc/src/glibc-20080218'
make: *** [all] Error 2 

Please, anyone with 64bit distribution and glibc CVS head, could you try the
attached example and post the time results of it? (or help me to build the
snapshot...)
Thank you

====== Original bug report ======================================
The math sin(double) function is in 64bit distribution (Kubuntu 7.10 AMD64 and
Fedora - unknown version) unreasonable slow (~400 microseconds on Atlon64 X2
4800+!!!) for some special values. In 32bit distribution is everything fine.
I captured some of those values:
0.93340582292648832662962377071381  0x3fedde75e36bb000
2.3328432680770916363144351635128   0x4002a9a9bb38add0
3.7439477503636453548097051680088   0x400df39ae0cdf500
3.9225160069792437411706487182528   0x400f615012801950
4.0711651639931289992091478779912   0x401048df854fdc20
4.7858438478542097982426639646292   0x401324b43fe92fc0
5.9840767662578002727968851104379   0x4017efb1d1df52a0

Example:
#include <math.h>
int main(int argc, char** argv) {
    volatile double value = 0.93340582292648832662962377071381;
    volatile double out;
    int i;
    for (i=0; i < 20000; i++)
        out = sin(value);
    return 0;
}

Comment 1 Jakub Jelinek 2008-02-21 09:39:00 UTC

Most of the double routines in libm come from IBM accurate matematical library,
which ensures <= 0.5ulp error.  Trigonometric etc. functions are computed using
floating point computations, but if the possible error from that is too high, it 
uses slower multiprecision computation to guarantee ultimate precise result.
Guess you just picked some worst-case values.
i386 uses the non-precise hardware instructions instead, so doesn't guarantee
the <= 0.5ulp precision.

Comment 2 jsm-csl@polyomino.org.uk 2008-02-21 17:09:30 UTC

Subject: Re:  Slow sine function for special values on AMD64
 - second attempt

On Thu, 21 Feb 2008, jakub at redhat dot com wrote:

> which ensures <= 0.5ulp error.  Trigonometric etc. functions are 
> computed using floating point computations, but if the possible error 
> from that is too high, it uses slower multiprecision computation to 
> guarantee ultimate precise result. Guess you just picked some worst-case 
> values.

Note that the crlibm developers were willing to contribute their code, an 
advantage of which is *much* better worst-case performance.

Comment 3 Jakub Jelinek 2008-02-21 17:42:51 UTC

Yeah, I'm aware of crlibm, I think if it proves itself that it won't be much
slower on average, has the same ultimate precision guarantees and faster
worst-cases, I don't see a reason why it can't be integrated.  It will be a lot
of work to integrate it though.

Comment 4 Petr Cervenka 2008-02-22 09:00:56 UTC

Is there any compile flag or #define, which can disable the <=0.5 ulp precision
and the math sin function will use only the fast built-in fp intructions?
For our real-time software it is necessary to be "quick", the ultra precision
has low priority.
Now we are using a workaround: I can put the original argument to long double
variable and call sinl function with long double result. Both, the new argument
and the result, have to be volatile to disable the compiler optimization of it
(probably uses the "fast" sin instead).

Results of sin(0.93340582292648832662962377071381)
----------------------------------------------------
distr function    value                    result_type printf_format
--------------------------------------------------------------------
32 sin  0.80365140438773496889268699305831 double      "%.32g"
32 sinl 0.80365140438773496889268699305831 double      "%.32g"
32 sinl 0.80365140438773491338153576180048 long double "%.32Lg

64 sin  0.80365140438773485787038453054265 double      "%.32g"
 (~ -5.5511151231257827021181583404541e-17 difference from 80bit value)
64 sinl 0.80365140438773496889268699305831 double      "%.32g"
 (~ +5.5511151231257827021181583404541e-17 difference from 80bit value )
64 sinl 0.80365140438773491338153576180048 long double "%.32Lg"

Comment 5 Petr Cervenka 2008-07-14 14:16:52 UTC

I'm not the only one with such problems:
http://sources.redhat.com/bugzilla/show_bug.cgi?id=5997
I assume that for the 64-bit distribution (x86_64), it should use sin and sinf
from i386 arch (sysdeps\i386\fpu\s_sin.S and sysdeps\i386\fpu\s_sinf.S) and only
sinl implementation is explicit x86_64. But the sin and sinf are now used as
software versions (IBM library). And it's usually bit slower, sometimes MUCH
MORE slower (1000x).
IBM library is perhaps only emergency implementation (if there is no hw support)
and it's not used for "better" (<= 0.5ULP) precision.
"The First Step is to Admit You Have a Problem!"

Comment 6 Joseph Myers 2012-02-29 20:19:32 UTC

Confirmed with current sources.  Suspending until a faster correctly rounding implementation (such as that proposed in http://gcc.gnu.org/ml/gcc/2012-02/msg00298.html ) is available as this is probably not amenable to a simple local fix.

Comment 7 Siddhesh Poyarekar 2013-04-01 12:54:41 UTC

FWIW, the function now runs much faster after the multiple precision improvements.  The worst case is only about a 100 times slower now instead of 1000 times.

I've not looked yet, but I think there is a case for capping maximum precision for worst case computation for sin (and all trigonometric functions) as well, so this could get even better.

Comment 8 Siddhesh Poyarekar 2013-04-10 04:25:05 UTC

Opening this since I've been working on improvements to the multiple precision bits that should have positive effect here.  In fact as I mentioned in comment 7, improvements are already evident.

Since optimization patches can go on forever, I'm going to put a cap on it for the resolution of this bug.  The cap is to implement findings of [1] if applicable.

[1] http://perso.ens-lyon.fr/jean-michel.muller/TMDworstcases.pdf

Comment 9 John Wilkinson 2014-02-05 21:07:14 UTC

I have also come across a very similar issue on i7 Intel platforms, please see bug 16531. Calls to cos can take around 0.15 ms, 1000 times their normal time, which is a serious problem for the real-time system we are developing.

Comment 10 John Wilkinson 2014-02-06 07:40:52 UTC

*** Bug 16531 has been marked as a duplicate of this bug. ***

Comment 11 Carlos O'Donell 2014-02-06 15:48:07 UTC

(In reply to John Wilkinson from comment #9)
> I have also come across a very similar issue on i7 Intel platforms, please
> see bug 16531. Calls to cos can take around 0.15 ms, 1000 times their normal
> time, which is a serious problem for the real-time system we are developing.

The default libm functions never guarantee constant runtime. You will have this same problem for many of the functions provided by the library.

However we are working on enhancing libm to include something like what you're looking for. Please have look at and comment:
https://sourceware.org/glibc/wiki/libm

Comment 12 Joseph Myers 2014-02-06 18:29:20 UTC

Really the issue for sin/cos/sincos is the same, so retitling the bug.

Comment 13 Joseph Myers 2014-02-06 18:31:12 UTC

*** Bug 14412 has been marked as a duplicate of this bug. ***

Comment 14 John Wilkinson 2014-02-07 08:21:55 UTC

(In reply to Carlos O'Donell from comment #11)
> However we are working on enhancing libm to include something like what
> you're looking for. Please have look at and comment:
> https://sourceware.org/glibc/wiki/libm

Thanks that looks useful. Is there a release schedule?

Comment 15 Carlos O'Donell 2014-02-07 14:57:07 UTC

(In reply to John Wilkinson from comment #14)
> (In reply to Carlos O'Donell from comment #11)
> > However we are working on enhancing libm to include something like what
> > you're looking for. Please have look at and comment:
> > https://sourceware.org/glibc/wiki/libm
> 
> Thanks that looks useful. Is there a release schedule?

Not yet. I'll update the wiki when I can commit resources. That doesn't stop others from joining in the discussion, or adding notes to the wiki like use cases and requirements.

Comment 16 Petr Cervenka 2014-02-14 09:51:05 UTC

Simple workaround to use fast computation is to use functions from spec. header similar to following:

#ifndef FAST_MATH_H
#define	FAST_MATH_H

#include <cmath>

inline double fast_sin(long double x) {
    return sinl(x);
}

inline double fast_cos(long double x) {
    return cosl(x);
}

inline double fast_tan(long double x) {
    return tanl(x);
}

inline double fast_asin(long double x) {
    return asinl(x);
}

inline double fast_acos(long double x) {
    return acosl(x);
}

inline double fast_atan(long double x) {
    return atanl(x);
}

inline double fast_atan2(long double x, long double y) {
    return atan2l(x, y);
}

inline double fast_sinh(long double x) {
    return sinhl(x);
}

inline double fast_cosh(long double x) {
    return coshl(x);
}

inline double fast_asinh(long double x) {
    return asinhl(x);
}

inline double fast_acosh(long double x) {
    return acoshl(x);
}

inline double fast_pow(long double x, long double y) {
    return powl(x, y);
}

inline double fast_sqrt(long double x) {
    return sqrtl(x);
}

inline double fast_exp(long double x) {
    return expl(x);
}

inline double fast_log(long double x) {
    return logl(x);
}

inline double fast_log10(long double x) {
    return log10l(x);
}

#endif	/* FAST_MATH_H */

Comment 17 Vincent Lefèvre 2015-04-16 13:15:01 UTC

(In reply to Petr Cervenka from comment #16)
> Simple workaround to use fast computation is to use functions from spec.
> header similar to following:
[long double versions: sinl, etc.]

This is not quite correct. These long double versions currently have a lower worst-case time (this might change in the future), but in average they are slower than the current double versions, with a factor around 5 - 6 from some tests on my machine. So, use this workaround only if you want a better worst-case time, e.g. for real-time system (this is your case[1], isn't it?).

[1] http://www.xenomai.org/pipermail/xenomai/2008-February/012416.html

The current library is slow, and libraries such as CRlibm could greatly improve things, but hard-to-round cases would always be slower than the average cases. So, the best implementation depends on the user's application. I suppose that most users would be happy with a *good* correctly rounded implementation since the loss due to correct rounding should hardly be noticeable *in average* compared to an implementation with just a very good accuracy (something close to 0.5 ulp). Then there are users who accept to sacrifice correct rounding and accuracy for faster functions. IMHO, the question is whether the GNU libc should implement such variants and provide a way for the user to select them (but this could mean other two variants or more[2]) or the user should build his own library based on his own requirements, e.g. with tools like MetaLibm[3] or other future tools.

[2] This should cover the accuracy for small arguments, but users may also have different requirements concerning the range reduction (large arguments).

[3] http://www.metalibm.org/

Comment 18 jsm-csl@polyomino.org.uk 2015-04-24 20:35:55 UTC

As I outlined in 
<https://sourceware.org/ml/libc-alpha/2013-07/msg00444.html>, and as 
described in the discussion of accuracy goals in the glibc manual, I don't 
think functions such as sin and cos should try to be correctly rounded, 
only functions such as crsin and crcos (if added).

Comment 19 Wilco 2021-03-11 14:44:55 UTC

This was fixed by a series of commits starting from 19a8b9a300f2f1f0012aff0f2b70b09430f50d9e

All slow paths have now been removed from all math functions.

Comment 20 jsm-csl@polyomino.org.uk 2021-03-11 21:12:20 UTC

If you log in to Bugzilla with your @gcc.gnu.org address you should have 
access to mark bugs RESOLVED / FIXED with target milestone set to the 
first release that will have the fix, rather than just commenting that 
they are fixed.

Comment 21 Siddhesh Poyarekar 2021-03-12 04:54:25 UTC

The last commit to remove slow paths appears to be this one:

commit f67f9c9af228f6b84579cb8c86312d3a7a206a55
Author: Anssi Hannula <anssi.hannula@bitwise.fi>
Date:   Mon Jan 27 12:45:10 2020 +0200

    ieee754: Remove slow paths from asin and acos

So this was finished in 2.33.

Comment 22 Siddhesh Poyarekar 2021-03-15 03:00:17 UTC

I just noticed that the last slow path removal patches went in with Wilco's comment 19.  Adjusting target milestone.