This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Only set rounding mode in SET_RESTORE_ROUND_53BIT
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: Siddhesh Poyarekar <siddhesh at redhat dot com>
- Cc: libc-alpha at sourceware dot org, Richard Henderson <rth at twiddle dot net>
- Date: Thu, 30 May 2013 16:04:14 -0400
- Subject: Re: [PATCH] Only set rounding mode in SET_RESTORE_ROUND_53BIT
- References: <20130529140319 dot GL2145 at spoyarek dot pnq dot redhat dot com> <20130530085917 dot GO2145 at spoyarek dot pnq dot redhat dot com> <20130530124908 dot GQ2145 at spoyarek dot pnq dot redhat dot com>
On 05/30/2013 08:49 AM, Siddhesh Poyarekar wrote:
> On Thu, May 30, 2013 at 02:29:17PM +0530, Siddhesh Poyarekar wrote:
>> On Wed, May 29, 2013 at 07:33:19PM +0530, Siddhesh Poyarekar wrote:
>>> I'm trying to understand the code that sets and restores the floating
>>> point state to try and eek out some performance from it. I haven't
>>> been able to figure out the need for SET_RESTORE_ROUND_53BIT vs
>>> SET_RESTORE_ROUND_53 for x86_64 - I can see that it's needed for x87:
>>>
>>> SET_RESTORE_ROUND (used in exp, log) only sets and restores the
>>> rounding mode and does not touch the exception masks. This was
>>> holdexcept_setround, i.e. the same implementation as
>>> SET_RESTORE_ROUND_53BIT before, but I assume the clearing of exception
>>> mask, etc. was removed because it could result in delay in raising
>>> floating point exceptions resulting from some operations within these
>>> functions if exceptions are masked.
>>>
>>> SET_RESTORE_ROUND_53BIT (used in sin/cos) sets the rounding mode,
>>> exception mask and clears all exception flags. Why can't we simply
>>> get away with setting and restoring just the rounding mode like in
>>> exp, log, etc? That opens up a good opportunity for optimization
>>> where I could avoid most of the set/restore code in the default case.
>>> sin and cos run almost twice as fast in some cases.
>>
>> So to put all this in code, the below patch is what I intend to do.
>> Instead of modifying exception flags, SET_RESTORE_ROUND_53BIT only
>> sets rounding mode and floating point mode (in x87) and restores it on
>> exit from the lexical block. I tested x86_64 as well as i386 with
>> --disable-sse to verify that this does not cause any regressions in
>> the testsuite. Would this change be correct?
>
> Sorry, I keep replying to myself, but I forgot to test this for
> possible benchmark improvements. Here are the results for sin, cos
> and tan, which was the functions affected by this change:
>
> Before:
>
> cos(): ITERS:3.1054e+08: TOTAL:28904.2Mcy, MAX:3916.53cy, MIN:69.603cy, 10743.8 calls/Mcy
> sin(): ITERS:2.86419e+08: TOTAL:28911.1Mcy, MAX:333.326cy, MIN:46.76cy, 9906.88 calls/Mcy
> tan(): ITERS:6.7984e+07: TOTAL:28908.9Mcy, MAX:3693.01cy, MIN:420.518cy, 2351.66 calls/Mcy
>
> After:
>
> cos(): ITERS:4.8486e+08: TOTAL:28890.5Mcy, MAX:188.859cy, MIN:57.562cy, 16782.7 calls/Mcy
> sin(): ITERS:4.06791e+08: TOTAL:28904.6Mcy, MAX:210.524cy, MIN:47.592cy, 14073.6 calls/Mcy
> tan(): ITERS:7.3346e+07: TOTAL:28907.1Mcy, MAX:800.115cy, MIN:388.597cy, 2537.3 calls/Mcy
>
> So that amounts to ~56% improvmeent for cos, 42% for sin and 7.8% for
> tan.
>
> So there is a definite performance advantage for just this patch as well.
Yay! This is exactly what the benchmarks are for! :-)
Cheers,
Carlos.