This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fix powerpc fmax, fmin sNaN handling (bug 20947) [committed]



On 16/12/2016 17:40, Steven Munroe wrote:
> On Fri, 2016-12-16 at 16:52 -0200, Adhemerval Zanella wrote:
>>
>> On 16/12/2016 15:41, Joseph Myers wrote:
>>> On Fri, 16 Dec 2016, Steven Munroe wrote:
>>>
>>>>> This patch fixes the powerpc versions of these functions (shared by
>>>>> float and double, 32-bit and 64-bit).  The structure of those versions
>>>>> is that all ordered cases are already handled before anything dealing
>>>>> with the case where the arguments are unordered; thus, this patch
>>>>> causes no change to the code executed in the common case (neither
>>>>> argument a NaN).
>>>>>
>>>> This patch, as is, will cause a sever performance regression for
>>>> power6/7/8 and should be fixed or reverted before 2.25 freezes.
>>>
>>> A general assumption in libm functions is that NaN inputs are an uncommon 
>>> case and performance for NaN inputs is not of much importance; the code 
>>> for NaN inputs should just aim for correctness and simplicity.  I think 
>>> people calling fmax or fmin generally want the maximum or minimum of two 
>>> finite numbers.
>>>
>>> As I said, there is no change to the code executed for the normal case of 
>>> non-NaN inputs, so I see no reason for performance in the common case to 
>>> be affected.  Of course performance improvements for the NaN case are 
>>> still welcome (and that sort of small local fix would seem perfectly 
>>> suitable to go in during the freeze if architecture maintainers think it 
>>> appropriate).
>>>
>>> If the presence of this code for the NaN case somehow affects the 
>>> performance of non-NaN case, please let me know; it wasn't meant to do so.
>>>
>>
>> I am running some tests on a powerpc64 box and using a fmax/fmin benchtests
>> with simple real, infinity and {q,s}NaN inputs and the default C 
>> implementation is showing better latency in all cases than the assembly 
>> ones for powerpc.
>>
>> Also considering that for powerpc64le usually it is target for power8,
>> it allows the compiler to select the best instruction combination for
>> builtin nan/inf/etc, so it allows more rooms for improvement with better
>> compiler support.
>>
>> So my idea for this specific issues would be just to remove the powerpc
>> assembly implementation (I am preparing a patch for it).
>>
> 
> Thanks Adhemerval, but which compiler provided these results?
> 
> Older compilers might not do as well, in which case backing porting your
> suggested fix to older distros and older GCC would be ill advised.

I used ubuntu 16 default one (gcc 5.4) for this experiment.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]