14412
2012-07-26 21:07:44 +0000
Removal of sysdeps/x86_64/fpu/s_sincos.S causes regressions
2014-06-17 18:55:17 +0000
1
1
1
Unclassified
glibc
math
2.16
All
All
RESOLVED
DUPLICATE
5781
https://bugs.freedesktop.org/show_bug.cgi?id=52551
P2
enhancement
2.18
1
markus
unassigned
aj
bugdal
carlos
davem
faltet
jakub
liubov.dmitrieva
nshmyrev
siddhesh
wbrana
oldest_to_newest
56592
0
markus
2012-07-26 21:07:44 +0000
The removal of sysdeps/x86_64/fpu/s_sincos.S more than doubles
the time it takes to render a particular pdf testpage with Okular.
commit b35fe25ed9b3b406754fe681b1785f330d9faf62
Author: Andreas Jaeger <aj@suse.de>
Date: Wed Mar 7 14:51:39 2012 +0100
[BZ #13658]
* sysdeps/x86_64/fpu/s_sincos.S: Delete.
* math/libm-test.inc (sincos_test): Add test for large input.
See the following poppler bug:
https://bugs.freedesktop.org/show_bug.cgi?id=52551
With glibc-2.16 it takes ~1 minute to open that page:
# Overhead Command Shared Object Symbol
35.90% okular libm-2.16.so [.] feraiseexcept
12.30% okular libm-2.16.so [.] __sin
9.03% okular libm-2.16.so [.] __cos
5.01% okular libpoppler.so.26.0.0 [.] void
std::__introsort_loop<SplashIntersect*, long,
cmpIntersectFunctor>(SplashIntersect*, SplashInt
ersect*, long, cmpIntersectFunctor)
2.80% okular libpoppler.so.26.0.0 [.] SplashXPathScanner::computeIntersections()
Before commit b35fe25ed9b3b40 it only takes 24 seconds:
# Overhead Command Shared Object Symbol
24.09% okular libm.so [.] __sincos
10.42% okular libpoppler.so.26.0.0 [.] SplashIntersect* std::__unguarded_partition<SplashIntersect*, SplashIntersect, cmpIntersectFunc
tor>(SplashIntersect*, SplashIntersect*, SplashIntersect const&, cmpIntersectFunctor)
5.73% okular libpoppler.so.26.0.0 [.] SplashXPathScanner::computeIntersections()
56600
1
aj
2012-07-27 06:44:27 +0000
Please have a look at the gcc -ffast-math option if you want fast sincos and can live with the limitations.
56601
2
markus
2012-07-27 07:18:37 +0000
(In reply to comment #1)
> Please have a look at the gcc -ffast-math option if you want fast sincos and
> can live with the limitations.
Yes. I've already tried this, but unfortunately it doesn't seem to help
in this case. Libpoppler behaves exactly the same when compiled with or without -ffast-math (or -Ofast).
56604
3
markus
2012-07-27 12:33:24 +0000
Actually, the current glibc behavior is equivalent to using
"-fno-builtin-sin -fno-builtin-cos" when building poppler.
56622
4
bugdal
2012-07-28 22:15:13 +0000
Judging from the profiling figures in the bug report, it looks like the problem is the cost of feraiseexcept, not the cost of correct sin/cos computation. Would dropping use of feraiseexcept and replacing it with simple arithmetic that generates the desired exception flags be a viable solution?
56722
5
carlos_odonell
2012-08-03 03:51:32 +0000
Marking this as an enhancement. We should try to fix this for 2.17 since it's a large performance regression for this function.
56802
6
markus
2012-08-09 10:14:25 +0000
Using Intel's SSE2 sinf and cosf http://sourceware.org/ml/libc-alpha/2012-06/msg00692.html solves the issue.
It only takes ~26 seconds to open the testcase:
# Overhead Command Shared Object Symbol
11.16% okular libpoppler.so.26.0.0 [.] SplashXPathScanner::computeIntersections()
11.04% okular libpoppler.so.26.0.0 [.] void std::__introsort_loop<SplashIntersect*, long, cmpIntersectFunctor>(SplashIntersect*, Sp
lashIntersect*, long, cmpIntersectFunctor) [clone .isra.15]
8.27% okular libpoppler.so.26.0.0 [.] Gfx::doRadialShFill(GfxRadialShading*)
5.43% okular libc-2.16.90.so [.] _int_malloc
5.31% okular libm-2.16.90.so [.] __cosf
5.16% okular libpoppler.so.26.0.0 [.] SplashXPathScanner::addIntersection(double, double, unsigned int, int, int, int)
5.01% okular libm-2.16.90.so [.] __sinf
2.98% okular libc-2.16.90.so [.] _int_free
2.83% okular libpoppler.so.26.0.0 [.] Lexer::getObj(Object*, int)
I had to manually hook up the new functions with:
#undef sin
#define sin sinf
#undef cos
#define cos cosf
57013
7
markus
2012-08-19 09:31:02 +0000
*** Bug 14496 has been marked as a duplicate of this bug. ***
58894
8
davem
2012-12-03 22:13:26 +0000
This needs more work and discussion so retargetting this bug to 2.18
61159
9
wbrana
2013-04-25 21:48:11 +0000
this is critical bug
it shouldn't be marked as enhancement
61161
10
carlos
2013-04-25 23:35:51 +0000
(In reply to comment #9)
> this is critical bug
> it shouldn't be marked as enhancement
If we fix sinf to be fast again it will be imprecise which is a critical bug for another class of users. There is no implementation that can satisfy all requirements from all users. At the very least the precise implementation is at least correct if slow. Therefore this is an enhancement request to make the implementation fast.
We are working on the possibility of having 3 libm variants, fast, default, and precise. At that point -ffast-math will select the fast variant and provide a fast but imprecise implementation.
Until then we need help looking into these performance issues to see if we can relax the feraiseexcept in some cases or optimize the code to call it less times.
61162
11
bugdal
2013-04-25 23:58:19 +0000
Carlos, as I noted above, the horrible slowness seems to be coming not from the correct algorithm but from the silly use of feraiseexcept to raise exceptions. Fixing this should get the performance back at least close to what it was before when the function was broken.
61163
12
bugdal
2013-04-26 00:14:13 +0000
Sorry, I see that you did acknowledge that already. I just skimmed through the code but I'm not familiar enough with out glibc's math library is laid out to figure out where feraiseexcept is getting called from... Once it's found, fixing the problem should just be a matter of replacing the call with code that would generate the exception naturally. However since the only exception that should be needed is the inexact exception, it should already be getting raised by the code to compute sin/cos.
A quick test to see if this is the source of the problem would be to make a no-op implementation of feraiseexcept and link it in (might need to use static linking to get libm to use it) then measure the performance.
61164
13
siddhesh
2013-04-26 02:54:59 +0000
This should have been fixed with the patch for bug 14496. Markus, can you confirm that?
61165
14
bugdal
2013-04-26 03:08:51 +0000
I can see how the fix to #14496 would help, but it's still a bug if feraiseexcept is being called at all. Certainly the code that was fixed in that patch should not even be taken as part of a call to sin or cos. There's simply no reason to be calling feraiseexcept.
61166
15
siddhesh
2013-04-26 03:50:20 +0000
That's the point of the fix to bug 14496. The root of the problem is that we save and restore rounding modes in functions that we don't have non-RN modes implemented. This calls feupdateenv (IIRC), which in turn calls feraiseexcept if there was an exception to be raised. The exception mask check in x86_64 was incorrect, causing feraiseexcept to be called all the time. with the fix, it is only called when there actually is an exception.
61167
16
markus
2013-04-26 06:29:48 +0000
(In reply to comment #13)
> This should have been fixed with the patch for bug 14496. Markus, can you
> confirm that?
It now takes ~40 seconds to render the test-page on trunk.
Better than 60 seconds but still 40% slower than s_sincos.S.
25.82% libm-2.17.90.so [.] __cos
22.06% libm-2.17.90.so [.] __sin
6.33% libm-2.17.90.so [.] __dubsin
4.59% libpoppler.so.36.0.0 [.] GfxPath::lineTo(double, double)
4.50% libpoppler.so.36.0.0 [.] Gfx::doRadialShFill(GfxRadialShading*)
61168
17
jakub
2013-04-26 06:40:03 +0000
With the split into 3 libm variants, fast alternative definitely doesn't have to support rounding other than round to even, but I'd say that even the default version shouldn't support it, only the precise one. After all, until very recently, those rounding modes were not supported in transcendentals in glibc, and -frounding-math isn't the default in gcc either.
Perhaps we should add a predefined macro for -frounding-math in gcc, and the precise variant of libm could be selected using that macro, if the correct rounding precise variants would be suffixed differently (like the *_finite entrypoints are for -ffast-math), then all the 3 libm variants could live in the same libm.so.6 and people would just choose what they want using -ffast-math vs. default vs. -frounding-math.
61171
18
siddhesh
2013-04-26 10:00:09 +0000
(In reply to comment #16)
> (In reply to comment #13)
> > This should have been fixed with the patch for bug 14496. Markus, can you
> > confirm that?
>
> It now takes ~40 seconds to render the test-page on trunk.
> Better than 60 seconds but still 40% slower than s_sincos.S.
That's expected. The assembly implementation used the fsincos instruction and the current implementation calls the sin() and cos() functions. The rounding mode changes have nothing to do with this part of the performance regression. Maybe we could implement a sincos_finite that calls the fsincos instruction and actually gets used with -ffinite-math.
(In reply to comment #17)
> Perhaps we should add a predefined macro for -frounding-math in gcc, and the
> precise variant of libm could be selected using that macro, if the correct
> rounding precise variants would be suffixed differently (like the *_finite
> entrypoints are for -ffast-math), then all the 3 libm variants could live in
> the same libm.so.6 and people would just choose what they want using
> -ffast-math vs. default vs. -frounding-math.
I like this idea.
61172
19
joseph
2013-04-26 11:31:29 +0000
On Fri, 26 Apr 2013, jakub at redhat dot com wrote:
> Perhaps we should add a predefined macro for -frounding-math in gcc, and the
> precise variant of libm could be selected using that macro, if the correct
> rounding precise variants would be suffixed differently (like the *_finite
Basing things on a predefined macro like that won't work with any future
GCC support for #pragma STDC FENV_ACCESS on (within the scope of that
pragma, function versions supporting rounding modes should be called).
So since you'd need a new compiler feature anyway, maybe something like
__builtin_rounding_math () would be better as it would allow FENV_ACCESS
support without further library changes (the safe default for older
compilers without -ffast-math being to assume -frounding-math may be in
effect). So you could have e.g.
#define sin(x) (__builtin_rounding_math () ? sin (x) : __sin_noround (x))
(I'm using plain sin as the version supporting rounding modes for safety
for programs declaring functions themselves rather than including the
system header.)
DTS 18661-1 constant rounding modes (see
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1676.pdf>) would require
a further compiler feature to say "call this function with the dynamic
rounding mode in effect being the constant mode for the current scope".
61173
20
joseph
2013-04-26 11:32:58 +0000
Note that having extra variants like this would definitely need
libm-test.inc tests to be run for the extra variants (there's an existing
issue on the wiki todo list to run the tests not involving infinities and
NaNs with -ffinite-math-only to provide coverage of the *_finite
versions).
61175
21
bugdal
2013-04-26 13:52:36 +0000
Siddhesh, still none of the fenv stuff should be called at all if the initial rounding mode was nearest (except of course checking the initial mode). This should be special-cased since it's true basically all the time in real-world applications.
61176
22
bugdal
2013-04-26 13:55:49 +0000
Jakub, the default libm should conform to IEEE requirements and should not cause correct programs to crash or go into infinite loops (both of which happen if you just remove the check for rounding mode due to poor design in the IBM math library). This is A LOT weaker than the current aim of glibc, which is to provide correct rounding for all transcendental functions despite the fact that IEEE explicitly chose not to require it (just <1ulp error). In my opinion, if there are 3 versions, they should be:
1. Fast: Whatever the gamer/graphics folks want.
2. Default: IEEE-conforming
3. Best: Correctly-rounded results for all functions
61178
23
joseph
2013-04-26 15:07:02 +0000
It's only a limited subset of transcendental functions for which glibc is
trying to be correctly rounding, and only for double for those functions;
for most functions the aim is generally errors within "a few" ulps (and
ISO C doesn't specify particular accuracy requirements), though correctly
rounding would be some sort of ideal.
Implementing the proposed TS 18661 bindings to IEEE 754-2008 would be a
huge amount of work (and require compiler work as well), but if anyone
gets the time for that work then I believe part 4 of the draft TS will
reserve names such as crsin for correctly rounding functions (and
crsinf128 for correctly rounding sine for _Float128, etc.), without
requiring those functions, so it might make sense eventually to provide
those names for correctly-rounding functions where available without
making them the default versions. So far I'm still working on going
through the accumulated libm bugs that don't require such huge amounts of
work as DTS 18661 or ISO 24747 implementation - where one reported bug
often shows up lots of related bugs not previously in Bugzilla, that also
need to be fixed - rather than trying to find time for larger libm
projects (and I'm also mostly leaving aside libm bugs in areas where
others have shown interest in fixing them, such as performance or errno
handling).
(The present functions, producing correctly-rounded results only for
round-to-nearest, would not of course meet the requirements for functions
such as crsin, although it's probably possible to adapt them by just doing
a little bit of final arithmetic in the original rounding mode, having
done the intermediate computations to produce a higher-precision result in
round-to-nearest.)
61179
24
wbrana
2013-04-26 15:20:54 +0000
(In reply to comment #10)
> If we fix sinf to be fast again it will be imprecise which is a critical bug
> for another class of users. There is no implementation that can satisfy all
> requirements from all users. At the very least the precise implementation is at
> least correct if slow. Therefore this is an enhancement request to make the
> implementation fast.
>
> We are working on the possibility of having 3 libm variants, fast, default, and
> precise. At that point -ffast-math will select the fast variant and provide a
> fast but imprecise implementation.
>
http://sourceware.org/bugzilla/show_bug.cgi?id=13658
impreciseness of fast sincos can't be critical bug because it took 15 years (1997-2012)
to report this bug which means s_sincos.S is precise enough for 99% of apps.
s_sincos.S should be restored because slow sincos can break many apps
because most open source apps aren't properly tested and app devs aren't aware
that sincos is 50% slower since 2.16.
Only users will notice than e.g. new Ubuntu is slow but users usually can't
submit proper bug report like this one to app devs.
App devs won't know that they should add -ffast-math to restore performance.
Very few apps are using -ffast-math in present.
-ffast-math can add new bugs to apps which aren't broken by fast sincos.
New function named e.g. sincos_precise should be added to glibc for 1% apps
which are affected by impreciseness of fast sincos.
61219
25
siddhesh
2013-04-29 10:29:46 +0000
I've posted a patch to implement __sincos_finite:
http://sourceware.org/ml/libc-alpha/2013-04/msg00720.html
This should allow you to compile with -ffast-math or -ffinite-math-only to use this fast implementation of sincos.
61221
26
wbrana
2013-04-29 10:50:24 +0000
(In reply to comment #25)
> I've posted a patch to implement __sincos_finite:
>
> http://sourceware.org/ml/libc-alpha/2013-04/msg00720.html
>
> This should allow you to compile with -ffast-math or -ffinite-math-only to use
> this fast implementation of sincos.
Will you also test about 35 000 Linux apps and libs if they are affected by slow sincos and not affected by -ffinite-math-only and submit bug reports to app/lib devs with request for addition of -ffinite-math-only to environment defined CFLAGS/CXXFLAGS?
61223
27
bugdal
2013-04-29 11:55:09 +0000
wbrana, "getting the right answer is slow" is not an excuse for giving the wrong answer. The "optimized" version of sincos has ZERO bits of precision for certain inputs.
With that said, there are not "35000" apps and libraries to check and send bug reports to. A quick check with nm(1) will show you which ones even call trig functions, and it's a tiny fraction of that. Moreover, of those that do, a good number (anything except gaming and graphics) actually want the correct answer, not the fast-but-wrong answer, and had bugs that had gone unnoticed with the old, wrong sincos.
Getting fast-but-wrong math results is easy; you don't need the standard library for this. The purpose of the C language providing math functions (and recommending that they conform to IEEE requirements) in the standard library is to give applications something that would be very difficult to achieve themselves. Being wrong-by-default but providing a special option to get correct answers is not a solution because portable programs that were not written specifically to glibc will not be able to get the right answers. They would have to encode glibc-specific knowledge to override the incorrect default. Requiring platform-specific knowledge to get the optimal performance on a given platform (glibc) is unfortunate but at least justifiable. Requiring platform-specific knowledge to get the correct results is not justifiable.
With all that said, I still question the cause of the huge performance drop. The only cost of the correct sin/cos versus the incorrect one should be argument reduction, which should be a no-op if the argument is already in the range [-pi,pi] or so and cheap even a good ways outside of that range...
61224
28
wbrana
2013-04-29 12:32:34 +0000
according to http://sourceware.org/bugzilla/show_bug.cgi?id=13658#c2
fast functions can be used when source operand is in certain range.
Slow functions should be used only outside interval.
"The FPTAN, FSIN, FCOS, and FSINCOS instructions set the C2 flag to 1 to
indicate that the source operand is beyond the allowable range of ±2^63 and
clear the C2 flag if the source operand is within the allowable range."
So, outside the interval [-2^63,+2^63] ("allowable range"), these instructions
must not be used (or they can be used, but with a fallback if the C2 flag is
set to 1). But note that the glibc implementation is more accurate, even with
(very probably) correct rounding, so that it is better to use it anyway.
61225
29
siddhesh
2013-04-29 13:06:40 +0000
(In reply to comment #28)
> according to http://sourceware.org/bugzilla/show_bug.cgi?id=13658#c2
> fast functions can be used when source operand is in certain range.
> Slow functions should be used only outside interval.
>
> "The FPTAN, FSIN, FCOS, and FSINCOS instructions set the C2 flag to 1 to
> indicate that the source operand is beyond the allowable range of ±2^63 and
> clear the C2 flag if the source operand is within the allowable range."
>
> So, outside the interval [-2^63,+2^63] ("allowable range"), these instructions
> must not be used (or they can be used, but with a fallback if the C2 flag is
> set to 1). But note that the glibc implementation is more accurate, even with
> (very probably) correct rounding, so that it is better to use it anyway.
The problem is not just about range reduction. fsincos is not always accurate within the allowable range either.
(In reply to comment #27)
> With all that said, I still question the cause of the huge performance drop.
> The only cost of the correct sin/cos versus the incorrect one should be
> argument reduction, which should be a no-op if the argument is already in the
> range [-pi,pi] or so and cheap even a good ways outside of that range...
I considered the possibility of keeping the assembly default and then falling into the default implementation if the exception flags indicate that the result was not accurate enough, but that doesn't catch all cases. In fact, the limited internal precision of PI (66-bit mantissa) guarantees that there will be cases where fsincos thinks it has hit the target, but it hasn't.
So until there's a way to flag inaccurate results reliably, I think the __sincos_finite way is the best option. There is definitely some scope to look at the implementation of __sin and __cos functions, or even figure out some clever way to get results faster in sincos than just calling __sin and __cos, but that will need more work.
61226
30
markus
2013-04-29 13:15:02 +0000
(In reply to comment #25)
> I've posted a patch to implement __sincos_finite:
>
> http://sourceware.org/ml/libc-alpha/2013-04/msg00720.html
>
> This should allow you to compile with -ffast-math or -ffinite-math-only to use
> this fast implementation of sincos.
Your patch doesn't work for me. I've re-build glibc and then
re-compiled poppler with "-ffinite-math-only". But I still
only see calls to __sin and __cos with "perf top".
The poppler code in question can be found here:
http://cgit.freedesktop.org/poppler/poppler/tree/poppler/Gfx.cc#n3112
When I just add the following file to glibc:
markus@x4 glibc % cat sysdeps/x86_64/fpu/s_sincos.S
#include <machine/asm.h>
.text
ENTRY (__sincos)
movsd %xmm0, -8(%rsp)
fldl -8(%rsp)
fsincos
fnstsw %ax
testl $0x400,%eax
jnz 1f
fstpl (%rsi)
fstpl (%rdi)
retq
1: fldpi
fadd %st(0)
fxch %st(1)
2: fprem1
fnstsw %ax
testl $0x400,%eax
jnz 2b
fstp %st(1)
fsincos
fstpl (%rsi)
fstpl (%rdi)
retq
END (__sincos)
weak_alias (__sincos,sincos)
poppler gets fast again and uses __sincos.
61227
31
neleai
2013-04-29 13:21:18 +0000
On Mon, Apr 29, 2013 at 12:32:34PM +0000, wbrana at gmail dot com wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=14412
>
> --- Comment #28 from wbrana at gmail dot com 2013-04-29 12:32:34 UTC ---
> according to http://sourceware.org/bugzilla/show_bug.cgi?id=13658#c2
> fast functions can be used when source operand is in certain range.
> Slow functions should be used only outside interval.
>
> "The FPTAN, FSIN, FCOS, and FSINCOS instructions set the C2 flag to 1 to
> indicate that the source operand is beyond the allowable range of ±2^63 and
> clear the C2 flag if the source operand is within the allowable range."
>
> So, outside the interval [-2^63,+2^63] ("allowable range"), these instructions
> must not be used (or they can be used, but with a fallback if the C2 flag is
> set to 1). But note that the glibc implementation is more accurate, even with
> (very probably) correct rounding, so that it is better to use it anyway.
>
Well results of sin,cos.. with large numbers are pure garbage no matter
what you try. As cos(1.0e18*PI) = cos(1.0e18*PI + PI) you can basicaly print random
number between -1.0 and 1.0 and be correct.
61228
32
bugdal
2013-04-29 13:25:12 +0000
wbrana, unfortunately that information is mistaken; fsin DOES NOT work on the interval [-2^63,+2^63]. A quick test reveals:
#include <math.h>
#include <stdio.h>
double x87_sin(double x)
{
__asm__("fsin" : "+t"(x));
return x;
}
int main()
{
double x = 0x1p63;
printf("%.15g %.15g\n", sin(x), x87_sin(x));
}
$ ./a.out
0.999930376673442 9.22337203685478e+18
I suspect this is just a boundary issue (the [] should be () in the range) since slightly smaller values give more reasonable, but still badly wrong, results. For example if x is 0x1p62,
-0.702922443619209 -0.707132927452779
which is wrong in the third decimal place.
Siddhesh, could you explain the motivation for "overloading" the meaning of "_finite" with "fast but wrong"? I'm guessing the idea is that you're thinking finite math implies not just lack of infinities but also "unreasonably large" inputs? This is definitely the glibc team's call, since there's no reasonable basis for assuming you'll get correct results from non-default performance-oriented math settings, but I think some justification would be nice.
61229
33
bugdal
2013-04-29 13:28:10 +0000
Ondrej, the results are not pure garbage. There are only two correct results sin(0x1p1000) can give, either of the nearest representable neighbors (less than 1ulp error) of the number 2**1000. Your fallacy is writing PI in your example. There is no such floating point number as PI, and this is part of why implementing correct trig functions is nontrivial; implementing ones that work in degree units, or a base-2 division of the unit circle, would be much easier.
61230
34
bugdal
2013-04-29 13:32:45 +0000
Markus, your implementation has incorrect argument reduction. For inputs outside of a small range (somewhere around [-2pi,2pi] but maybe slightly larger), and near multiples of pi/2 (where either sin or cos has a near-zero result and thus needs extra precision) you really need to fall back to the C range-reduction code (or good luck reimplementing THAT in asm...). You could perhaps use the x87 instruction in limited cases, though.
61231
35
wbrana
2013-04-29 13:36:24 +0000
(In reply to comment #32)
> wbrana, unfortunately that information is mistaken; fsin DOES NOT work on the
> interval [-2^63,+2^63]. A quick test reveals:
what about smaller interval [-2^31,+2^31]
61232
36
siddhesh
2013-04-29 13:44:08 +0000
(In reply to comment #30)
> Your patch doesn't work for me. I've re-build glibc and then
> re-compiled poppler with "-ffinite-math-only". But I still
> only see calls to __sin and __cos with "perf top".
Looks like your code does not call sincos directly. sin and cos calls on the
same input seem to get optimized into sincos, so I'm not sure if
ffinite-math-only can detect that. Also, the patch also changes an installed header, so you need to somehow include that during your compilation.
61234
37
bugdal
2013-04-29 13:51:50 +0000
At the outer edges of that range, the error is "only" around 1000 ulps. This isn't correct, but at least it's better than just having 4 or 5 bits of precision (or none at all). Since the x87 internally uses a 66-bit representation of pi, and doubles are 52-bit, I suspect precision starts to trail off around 16384*pi, but it will be bad even for much smaller values when the result is close to zero.
Experimentally, 0x1p18 seems to be the first power of two for which the result is wrong within 15 decimal places, but 0x1p14*M_PI is already badly wrong (only the first 4 decimal places are correct).
61235
38
neleai
2013-04-29 14:04:37 +0000
On Mon, Apr 29, 2013 at 01:28:10PM +0000, bugdal at aerifal dot cx wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=14412
>
> --- Comment #33 from Rich Felker <bugdal at aerifal dot cx> 2013-04-29 13:28:10 UTC ---
> Ondrej, the results are not pure garbage. There are only two correct results
> sin(0x1p1000) can give, either of the nearest representable neighbors (less
> than 1ulp error) of the number 2**1000. Your fallacy is writing PI in your
> example. There is no such floating point number as PI, and this is part of why
> implementing correct trig functions is nontrivial; implementing ones that work
> in degree units, or a base-2 division of the unit circle, would be much easier.
>
Ok, assume you wrote function cos_deg that takes input degree in
degrees. You still have
cosdeg(360*(2**60)) == cosdeg(360*(2**60)+180)
In short when you have inputs with zero significant digits you can
expect only garbage as output.
61236
39
siddhesh
2013-04-29 14:19:35 +0000
(In reply to comment #36)
> (In reply to comment #30)
> > Your patch doesn't work for me. I've re-build glibc and then
> > re-compiled poppler with "-ffinite-math-only". But I still
> > only see calls to __sin and __cos with "perf top".
>
> Looks like your code does not call sincos directly. sin and cos calls on the
> same input seem to get optimized into sincos, so I'm not sure if
> ffinite-math-only can detect that. Also, the patch also changes an installed
> header, so you need to somehow include that during your compilation.
I've just verified locally that if gcc's transformation of sin + cos into sincos works fine with the further transformation into __sincos_finite with -ffinite-math-only, so it's most likely that your code is not using the header file changes. Easiest way to test this is to add the patch on top of your distribution glibc, build and install it.
BTW, the question of whether -ffinite-math-only is the right option to do this kind of thing is still open, so it's likely that the resulting fix will be different. The crux would be similar though, i.e. a fast function assembly that is not always accurate.
61237
40
markus
2013-04-29 14:29:29 +0000
(In reply to comment #39)
> (In reply to comment #36)
> > (In reply to comment #30)
> > > Your patch doesn't work for me. I've re-build glibc and then
> > > re-compiled poppler with "-ffinite-math-only". But I still
> > > only see calls to __sin and __cos with "perf top".
> >
> > Looks like your code does not call sincos directly. sin and cos calls on the
> > same input seem to get optimized into sincos, so I'm not sure if
> > ffinite-math-only can detect that. Also, the patch also changes an installed
> > header, so you need to somehow include that during your compilation.
>
> I've just verified locally that if gcc's transformation of sin + cos into
> sincos works fine with the further transformation into __sincos_finite with
> -ffinite-math-only, so it's most likely that your code is not using the header
> file changes. Easiest way to test this is to add the patch on top of your
> distribution glibc, build and install it.
Hmm, that was exactly how I've tested your patch. And <math.h> is included
in Gfx.cc, so bits/math-finite.h should be included when
-ffinite-math-only is used.
Will double-check later.
61240
41
bugdal
2013-04-29 14:48:43 +0000
Ondrej, the input always has 52 significant bits unless it's denormal. If, due to the way you computed the input value, none of them are significant to you, then that's your problem. sin() and cos() are still required to return the correct result assuming all bits of input are significant.
61245
42
neleai
2013-04-29 16:29:40 +0000
On Mon, Apr 29, 2013 at 02:48:43PM +0000, bugdal at aerifal dot cx wrote:
> http://sourceware.org/bugzilla/show_bug.cgi?id=14412
>
> --- Comment #41 from Rich Felker <bugdal at aerifal dot cx> 2013-04-29 14:48:43 UTC ---
> Ondrej, the input always has 52 significant bits unless it's denormal. If, due
> to the way you computed the input value, none of them are significant to you,
> then that's your problem.
> sin() and cos() are still required to return the
> correct result assuming all bits of input are significant.
>
Citation needed.
61250
43
bugdal
2013-04-29 17:30:14 +0000
I don't have a copy of the latest IEEE 754 on hand (there's no free online version), but the consensus among implementors seems to be that a few special functions (such as sqrt) require correct rounding, and that the remaining functions (mostly transcendental) are required to give results with less than 1ulp of error. If sin(0x1p1000) is computed with incorrect argument reduction, it may have up to approximately 1e16 ulp error.
Moreover, the C standard specifies:
"2 The sin functions compute the sine of x (measured in radians).
Returns
3 The sin functions return sin x."
Without annex F, the C standard places no requirements on the precision or accuracy of floating point results (e.g. 2.0+2.0==5.0 is legal), but annex F recommends IEEE conforming behavior.
If anyone else can provide some better citations, they'd be welcome.
61263
44
joseph
2013-04-29 20:58:15 +0000
On Mon, 29 Apr 2013, bugdal at aerifal dot cx wrote:
> I don't have a copy of the latest IEEE 754 on hand (there's no free online
> version), but the consensus among implementors seems to be that a few special
> functions (such as sqrt) require correct rounding, and that the remaining
> functions (mostly transcendental) are required to give results with less than
> 1ulp of error. If sin(0x1p1000) is computed with incorrect argument reduction,
> it may have up to approximately 1e16 ulp error.
IEEE 754-2008, subclause 5.4 "formatOf general-computational operations",
specifies squareRoot and fusedMultiplyAdd on the same basis as addition,
subtraction, multiplication, division, convertFromInt, various operations
converting to integer formats and a range of other operations. As the
2008 revision, these are "formatOf" operations (so including e.g. fused
multiply-add of long double values, rounded once to float as the
destination format), which do not necessarily have C bindings as of C11 -
for the formatOf bindings you need DTS 18661-1.
Those operations are fully defined. C11 Annex F for sqrt says "sqrt is
fully specified as a basic arithmetic operation in IEC 60559. The returned
value is dependent on the current rounding direction mode." (F.10.4.5) and
similarly for fma (F.10.10.1) says "fma(x, y, z) computes xy + z,
correctly rounded once.".
For most functions, not fully defined in IEEE 754 or C11, no specific
accuracy requirements are imposed, although there are requirements on
particular special cases in Annex F. That is, Annex F leaves the
implementation-defined accuracy from 5.2.4.2.2 paragraph 6 for most of the
functions.
IEEE 754-2008 clause 9 "Recommended operations" lists a range of other
functions with the requirement "A conforming function shall return results
correctly rounded for the applicable rounding direction for all operands
in its domain.". These include, for example, both sin and sinPi (sine of
pi times the argument). There are no ISO C bindings for these operations
at present. I believe the intent is that TS 18661-4 will specify bindings
to them, using names such as crsin for the correctly rounding functions
(while probably adding names such as sinpi for non-correctly-rounding
versions). I believe the intent is that the correctly-rounding versions
will be optional for implementations of TS 18661-4. (The exhaustive
searches required to prove correctly-rounding, bounded-resource-use
implementations of various functions are as far as I know still not
feasible for binary128, for example, and correct rounding without
arbitrary-precision computation is particularly hard for functions of
multiple arguments.)
I don't know where the idea of a 1ulp limit on errors being some sort of
IEEE requirement came from. As far as I'm concerned, sin(0x1p1000) giving
a result that is "reasonably close" (say a few ulp) to the actual sine of
0x1p1000 is a matter of quality of implementation, and giving a result
that isn't wildly different is a matter of meeting the requirement that
sin implements the sine function rather than some other function that
returns random numbers for large inputs.
(For some of the additional functions in ISO 24747, the results for large
values of certain inputs are specified as implementation-defined because
of perceived difficulty of implementing the functions accurately for large
inputs. But none of the functions in C11 itself have such latitude to do
something arbitrary for large inputs.)
61265
45
siddhesh
2013-04-30 05:41:27 +0000
(In reply to comment #32)
> I suspect this is just a boundary issue (the [] should be () in the range)
> since slightly smaller values give more reasonable, but still badly wrong,
> results. For example if x is 0x1p62,
Yes, it is () and not [], according to the manual.
>
> -0.702922443619209 -0.707132927452779
>
> which is wrong in the third decimal place.
>
This is not restricted to numbers on the boundary. A quick google search on accuracy of fsincos will show that there are numbers even close to pi/2 that are way off in terms of accuracy.
> Siddhesh, could you explain the motivation for "overloading" the meaning of
> "_finite" with "fast but wrong"? I'm guessing the idea is that you're thinking
> finite math implies not just lack of infinities but also "unreasonably large"
> inputs? This is definitely the glibc team's call, since there's no reasonable
> basis for assuming you'll get correct results from non-default
> performance-oriented math settings, but I think some justification would be
> nice.
That was a mistake - I somehow muddled it up with -ffast-math in my head.
I'm going to first try to see if the default sincos can be made faster. Jakub pointed out offline that the argument reduction code is duplicated in the calls to __sin and __cos and consolidating that in sincos should definitely be a good first step.
61282
46
bugdal
2013-04-30 21:37:51 +0000
Ondrej, I think the citation we were looking for is C99 7.12.1 Treatment of error conditions:
"1 The behavior of each of the functions in <math.h> is specified for all representable values of its input arguments, except where stated otherwise."
61291
47
bugdal
2013-05-01 22:25:14 +0000
Another citation, from the C99 Rationale (http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf), supporting the option to give wrong results for trig functions:
Page 27, lines 9-13:
"Some math functions such as those that do argument reduction modulo an approximation of π have good accuracy for small arguments, but poor accuracy for large arguments. It is not unusual for an implementation of the trigonometric functions to have zero bits correct in the computed result for large arguments. For cases like this, an implementation might break the domain of the function into disjoint regions and specify the accuracy in each region."
Note that an implementation should still document the accuracy, and in the case of wrong argument reduction, the error could theoretically be as large as 2**1074 ULPs for some inputs of sin/cos (if the correct result were denormal but the function returned 1.0 or -1.0) and much worse (even possibly infinite) for tan.
63377
48
carlos
2013-09-07 06:55:07 +0000
*** Bug 15936 has been marked as a duplicate of this bug. ***
66204
49
jsm28
2014-02-06 18:31:12 +0000
Although, formally, the regression part came from removing that file, the substance (that the IBM libm functions are slow in special cases and correctly-rounded functions should be under different names, not the default for sin, cos, sincos) is the same as bug 5781.
*** This bug has been marked as a duplicate of bug 5781 ***