This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf
- From: Ashwin Sekhar T K <ashwin dot sekhar at caviumnetworks dot com>
- To: libc-alpha at sourceware dot org
- Cc: Ashwin Sekhar T K <ashwin dot sekhar at caviumnetworks dot com>
- Date: Tue, 13 Jun 2017 00:17:05 -0700
- Subject: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf
- Authentication-results: sourceware.org; auth=none
- Authentication-results: sourceware.org; dkim=none (message not signed) header.d=none;sourceware.org; dmarc=none action=none header.from=caviumnetworks.com;
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
This patchset adds the optimized ASIMD version of sinf/cosf
for Aarch64. The algorithm and code flow is based on the SSE versions
of the same in sysdeps/x86_64/fpu.
The ASIMD versions are used only if the cpu supports asimd feature.
It uses ifuncs and HWCAP to identify the ASIMD capability.
The patchset was tested using "make check" for the math sub-directory.
The tests were run on linux 4.4.0-45-generic on ThunderX88 platform.
The following are the approximate speedups observed over the
existing implementation on different Aarch64 platforms for
different input values.
SINF
---------------------------------------------------------
Input ThunderX88 ThunderX99 CortexA57
---------------------------------------------------------
0.0 1.88x 1.18x 1.17x
2.0^-28 1.33x 1.12x 1.03x
2.0^-6 1.48x 1.28x 1.27x
0.6*Pi/4 0.94x 1.14x 1.21x
13*Pi/8 1.41x 2.00x 2.16x
17*Pi/8 1.45x 1.93x 2.23x
1000*Pi/4 19.68x 37.46x 27.99x
2.0^51 12.00x 13.58x 13.49x
Inf 1.04x 1.05x 1.12x
Nan 0.95x 0.87x 0.82x
---------------------------------------------------------
COSF
---------------------------------------------------------
Input ThunderX88 ThunderX99 CortexA57
---------------------------------------------------------
0.0 1.25x 1.14x 1.17x
2.0^-28 1.24x 1.14x 1.13x
2.0^-6 1.38x 1.38x 1.85x
0.6*Pi/4 1.15x 1.38x 1.69x
13*Pi/8 1.65x 1.94x 2.18x
17*Pi/8 1.49x 2.05x 2.09x
1000*Pi/4 18.98x 38.39x 27.52x
2.0^51 11.35x 13.74x 13.47x
Inf 0.99x 1.02x 1.16x
Nan 0.88x 0.86x 0.87x
---------------------------------------------------------
Ashwin Sekhar T K (2):
aarch64: Add optimized ASIMD version of sinf
aarch64: Add optimized ASIMD version of cosf
sysdeps/aarch64/fpu/multiarch/Makefile | 3 +
sysdeps/aarch64/fpu/multiarch/s_cosf-asimd.S | 367 +++++++++++++++++++++++++
sysdeps/aarch64/fpu/multiarch/s_cosf.c | 31 +++
sysdeps/aarch64/fpu/multiarch/s_sinf-asimd.S | 382 +++++++++++++++++++++++++++
sysdeps/aarch64/fpu/multiarch/s_sinf.c | 31 +++
5 files changed, 814 insertions(+)
create mode 100644 sysdeps/aarch64/fpu/multiarch/Makefile
create mode 100644 sysdeps/aarch64/fpu/multiarch/s_cosf-asimd.S
create mode 100644 sysdeps/aarch64/fpu/multiarch/s_cosf.c
create mode 100644 sysdeps/aarch64/fpu/multiarch/s_sinf-asimd.S
create mode 100644 sysdeps/aarch64/fpu/multiarch/s_sinf.c
--
2.12.2