This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Remove alpha specific fmax, fmin to fix sNaN handling [BZ #20947]
On 02/01/2018 12:04, Joseph Myers wrote:
> On Mon, 1 Jan 2018, Adhemerval Zanella wrote:
>
>>> In the case of ceil, inexact should never be generated. Since the alpha
>>> ceil implementations work entirely with asm which does not use /i to
>>> enable inexact exceptions, I'm not sure why they should generate such
>>> exceptions spuriously. What failures are you seeing exactly - every case
>>> of noninteger arguments to ceil / ceilf, or only some such cases, or even
>>> cases of integer arguments?
>>
>> The ceil/ceilf issues are in attachments (ran with s_ceil{f} built with
>> -mieee-with-inexact).
>
> ceil / ceilf should *not* be built with -mieee-with-inexact (since they
> should never raise inexact). But also that option shouldn't make any
> difference to those functions.
>
> This is systematically raising spurious inexact for noninteger ceil /
> ceilf arguments. I don't see why these arguments would trap to the
> kernel, but maybe (a) confirm in a debugger exactly which instruction
> results in inexact being raised; (b) maybe instrument the kernel to report
> when that instruction is being emulated so you can see if the emulation is
> involved here at all? If the emulation is involved, the kernel should be
> fixed to check TRP to see if inexact should be raised.
It is the 'cvttq/svm' which changes the fpcr and sets INE bit.
(gdb) i r fpcr
fpcr 0x680e000000000000 7497930429618454528
(gdb) ni
0x000002000009a194 38 __asm (
(gdb) i r fpcr
fpcr 0xe90e000000200000 -1653384013196296192
(0x000002000009a194 is the cvttq/svm from s_ceil.S).
A comment from alpha divq.S (present in other assembly implementation
as well) states:
37 The FPCR save/restore is due to the fact that the EV6 _will_ set FPCR_INE
38 for cvttq/c even without /sui being set. It will not, however, properly
39 raise the exception, so we don't have to worry about FPCR_INED being clear
40 and so dying by SIGFPE. */
Which leads to believe we are it seems valid to /m as well. Also the comments
on qemu patch at [1] indicates that CVTTQ semantic does set inexact for
1. denorms -> 0 and 2. values outside of that range -> lower 64 bits of value.
So I am not sure if it a hardware issue or a expected semantic (Alpha Architecture
Handbook I have access does indicate that cvttq sets INE bit for some operations).
I haven't tested if it is the case of an emulated instruction (I currently
I do not have access to rebuild/reinstall new kernel on the machine), but
since I am checking on EV68CB I guess it is not.
In any case I think we have two options here: either adjust the implementation
to clear FPCR_INE bit after cvttq/svm (which will incur in a mf_fpcr followed
by a mt_fpcr) or just remove the optimized implementation. I more inclined
the the former since working on FPCR is usually costly, a very naive attempt
to save/restore the fpcr on cvttq for ceil did solved the issues but also
showed worse performance than using the generic implementation (I used a
ceil benchtests based on trunc{f} inputs).
[1] https://patchwork.ozlabs.org/patch/363303/
>
>>> That however does not explain issues for fma / fmaf. What do you see
>>> there - spurious inexact, missing inexact, wrong results? The use of
>>> -mieee-with-inexact ought to ensure instructions are generated that set
>>> "inexact" appropriately, and unless it's set appropriately, wrong results
>>> can occur because the round-to-odd implementation relies on correct
>>> setting of inexact. fmaf in particular is very simple, so as long as the
>>> right instructions are used and nothing gets reordered past the libc_fe*
>>> calls, not much should be able to go wrong.
>>
>> The issues I am seeing on alpha for fma/fmaf are also in attachments.
>
> For float, these are all missing underflow exceptions.
>
> Alpha is an architecture with after-rounding tininess detection. Recall
> that after-rounding tininess detection is based on what the result would
> be if rounded to normal precision but with infinite exponent range, so
> it's possible for a result to be rounded to +/- the least normal but still
> result in underflow with after-rounding tininess detection, which appears
> to be the case for the failing tests for float.
>
> Now, the Linux kernel has an old soft-fp version that only supports
> before-rounding tininess detection, but the cases with before-rounding
> underflow are a strict superset of those with after-rounding underflow, so
> that can't explain missing underflow exceptions. (I tried in 2015 to get
> updated soft-fp into the Linux kernel. A patch series was accepted into a
> powerpc tree that was supposed to be pull-requested for Linux 4.4
> <https://lkml.org/lkml/2015/8/26/804> but it never actually got into
> Linus's tree for some reason.)
>
> Maybe there is a a hardware bug that means certain underflow cases neither
> raise the underflow flag in hardware nor pass things to software
> emulation, or something like that?
>
> (IEEE 754-1985, unlike IEEE 754-2008, allows for underflow to be raised
> only where there are both tininess and loss of accuracy as detected as a
> denormalization loss, as opposed to tininess and inexactness. But the
> Alpha Architecture Handbook says "In the Alpha architecture, tininess is
> detected by hardware after rounding, and loss of accuracy is detected by
> software as an inexact result.", which indicates that option in IEEE
> 754-1985 isn't relevant here.)
>
> For double, there are a few cases of missing underflow exceptions, for
> which the above analysis would apply. But most of the failures there are
> spurious underflow exceptions, which are more mysterious, as they include
> cases where the result is large, nowhere near underflowing. I'd suggest
> finding out exactly which instruction, with what operands, is generating
> the spurious underflow exception (possibly an instruction that generates
> an exact subnormal result, where the underflow flag should not be set?).
> And, again, see whether kernel emulation is involved for that instruction.
>
I will try to investigate fma{f} as well.