This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: IEEE128 binary float to decimal float conversion routines
- From: Joseph Myers <joseph at codesourcery dot com>
- To: Steven Munroe <munroesj at linux dot vnet dot ibm dot com>
- Cc: Steve Munroe <sjmunroe at us dot ibm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, Michael R Meissner <mrmeissn at us dot ibm dot com>, "Paul E. Murphy" <murphyp at linux dot vnet dot ibm dot com>, Tulio Magno Quites Machado Filho <tuliom at linux dot vnet dot ibm dot com>
- Date: Wed, 16 Dec 2015 00:06:58 +0000
- Subject: Re: IEEE128 binary float to decimal float conversion routines
- Authentication-results: sourceware.org; auth=none
- References: <564A16D5 dot 3020105 at linux dot vnet dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511161803500 dot 30498 at digraph dot polyomino dot org dot uk> <564A6A90 dot 40607 at linux dot vnet dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511162356020 dot 32387 at digraph dot polyomino dot org dot uk> <201511180131 dot tAI1Vs2L023118 at d03av01 dot boulder dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511180144150 dot 2302 at digraph dot polyomino dot org dot uk> <201511182301 dot tAIN1Igc011083 at d03av02 dot boulder dot ibm dot com> <alpine dot DEB dot 2 dot 10 dot 1511182322260 dot 26547 at digraph dot polyomino dot org dot uk> <1449594999 dot 9274 dot 45 dot camel at oc7878010663> <alpine dot DEB dot 2 dot 10 dot 1512081737230 dot 19569 at digraph dot polyomino dot org dot uk> <1450214326 dot 9926 dot 37 dot camel at oc7878010663>
On Tue, 15 Dec 2015, Steven Munroe wrote:
> Ok let my try with the simpler case of _Decimal128 to Float128 where the
> significand conversion is exact (log2(10e34) -> 112.9 -> <= 113 bits).
> So you mention "continued fraction analysis" which was not part of my
> formal education (40+ years ago) but I will try.
The theory of rational approximations (determining how close a/b can be to
given x for integer a and b, given bounds on b) applies here, because you
want to determine how close a*10^m and b*2^n can be, for a and b in the
ranges of mantissa values and m and n corresponding exponents, which is
essentially equivalent to determining how close a/b can be to 2^n/10^m.
Earlier in this thread, Christoph Lauter referred to a paper
<http://www.computer.org/csdl/trans/tc/preprint/07271015-abs.html> about
comparison between binary and decimal floating-point values - conversions
are essentially the same issue, and that paper references previous work on
conversions. Section 4.2 of that paper discusses how the classical theory
of continued fractions can be applied to find the closest cases (the exact
details of the relevant analysis depend on the details of the code you use
to implement the conversions).
> So any _Decimal128 < 9999999999999999999999999999999999e48 (1.0e82) can
> be converted with one _Float128 multiply, of 2 exact values, giving a
> rounded result to 1ULP.
Those cases are indeed straightforward (and if you round-to-odd then that
gives conversions of numbers within that range to narrower binary types,
and you can similarly divide instead of multiplying in cases of integer *
1e-n, n <= 48).
> Now as the exponent of _Decimal128 input exceeds 48 the table of
> _float128 powers of 10 will contain values that have been rounded. Now I
> assume that some additional exponent range can be covered by by insuring
> that the table _float128 powers_of_10 have been pre-rounded to odd?
But pre-rounding the larger powers to odd doesn't help; rounding to odd
generally only helps when it's the very last operation before rounding to
a narrower type that gets rounded to odd, with all previous operations
being exact. In this case, you'd have an inexact power followed by a
multiplication, and because one of the arguments to the multiplication is
inexact (and the multiplication is not by a power of 2), the result of the
multiplication may not be correctly rounded.
Instead, the larger powers need to be stored with extra precision, and
extra precision used for the multiplication. (You don't need to store all
the powers in the range of _Float128 because you have a speed/space
trade-off; you could store fewer powers and then compute the one you need
at runtime by doing an extra-precision multiply of two or more stored
powers. Or you could store them all if that's the right trade-off for the
processors in question.)
The amount of extra precision needed depends on both how close the closest
cases for correct rounding are, and on how much error can accumulate from
the multiplications - there needs to be enough precision that the
inaccuracy in the stored values and subsequent computations is small
enough that it does not affect the rounding of the final result.
--
Joseph S. Myers
joseph@codesourcery.com