This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Sparc exp(), expf() performance improvement

From: Patrick McGehearty <patrick dot mcgehearty at oracle dot com>
To: libc-alpha at sourceware dot org
Date: Mon, 31 Jul 2017 16:06:44 -0500
Subject: Re: [PATCH] Sparc exp(), expf() performance improvement
Authentication-results: sourceware.org; auth=none
References: <1501529969-96949-1-git-send-email-patrick.mcgehearty@oracle.com> <20170731.124719.1163288220939988504.davem@davemloft.net>

Sparc has a significant performance issue with RAW (read after write).

That is, if a value is stored to a particular address and then read fromthat

address before the store has reached L2 cache, a pipeline hiccup occurs
and a 30+ cycle delay is seen. Most commonly this issue is seen in the case

of register spill/fills, but it also occurs when a value in an integerregisterto stored to a temporary in memory and then loaded to a floating pointregister.

The int to fp and fp to int operations are common in exp() algorithms due
to cracking the exponent from the mantissa to determine which special
case to use in handling particular input data ranges.

Starting with Niagara4 (T4), direct int to fp and fp to int transferinstructions

were added, avoiding this performance issue. If we compile for any Sparc
platform instead of T4 and later, we can't use the direct transfers.
Note that T4 was first introduced in 2011, meaning most current
Sparc/Linux platforms will have this support.

For comparison, recent x86 chips from Intel have thrown enough HW at
the RAW issue to not have any delays when a read-after-write occurs.

The new algorithm is significantly different from the existingsysdeps/ieee754 algorithm.The new algorithm matches the one used by the Solaris/Studio libm exp(),expf() code.

My effort was involved in porting (with Oracle corporate permission), not
algorithm construction.

It seems likely that this code could be faster on other CPUs, but I'veonly tested it on Sparcas that's the machines I have ready access to. The advantage may be muchless on other platforms.


- patrick


On 7/31/2017 2:47 PM, David Miller wrote:

From: Patrick McGehearty <patrick.mcgehearty@oracle.com>
Date: Mon, 31 Jul 2017 15:39:29 -0400

This PATCH is intended to improve exp() and expf() performance on Sparc.
These changes will only be active on Sparc platforms and only for
those platforms that support HWCAP_SPARC_CRYPTO (niagara4 and later).

Can you explain which instructions exactly help make the compiled
C code for exp() and expf() faster instead of being vague like
this?

Wouldn't the new C code you are adding be faster on other CPUs as
well, even without gcc generating instructions for Niagara 4 and
later?

Thank you.

Follow-Ups:
- Re: [PATCH] Sparc exp(), expf() performance improvement
  - From: David Miller

References:
- [PATCH] Sparc exp(), expf() performance improvement
  - From: Patrick McGehearty
- Re: [PATCH] Sparc exp(), expf() performance improvement
  - From: David Miller

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]