This is the mail archive of the
glibc-cvs@sourceware.org
mailing list for the glibc project.
GNU C Library master sources branch master updated. glibc-2.21-472-g2a8c2c7
- From: andros at sourceware dot org
- To: glibc-cvs at sourceware dot org
- Date: 15 Jun 2015 12:08:08 -0000
- Subject: GNU C Library master sources branch master updated. glibc-2.21-472-g2a8c2c7
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, master has been updated
via 2a8c2c7b335ed07f63c246077fa672d8eaed23e4 (commit)
from bf1435783d5031e54f2f74ba3028db3c225a9da8 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=2a8c2c7b335ed07f63c246077fa672d8eaed23e4
commit 2a8c2c7b335ed07f63c246077fa672d8eaed23e4
Author: Andrew Senkevich <andrew.senkevich@intel.com>
Date: Mon Jun 15 15:06:53 2015 +0300
Vector sinf for x86_64 and tests.
Here is implementation of vectorized sinf containing SSE, AVX,
AVX2 and AVX512 versions according to Vector ABI
<https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>.
* sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added.
* sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration for sinf.
* sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files.
* sysdeps/x86_64/fpu/Versions: New versions added.
* sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
* sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added
build of SSE, AVX2 and AVX512 IFUNC versions.
* sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core_sse4.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core_avx2.S: New file.
* sysdeps/x86_64/fpu/svml_s_sinf16_core.S: New file.
* sysdeps/x86_64/fpu/svml_s_sinf4_core.S: New file.
* sysdeps/x86_64/fpu/svml_s_sinf8_core.S: New file.
* sysdeps/x86_64/fpu/svml_s_sinf8_core_avx.S: New file.
* sysdeps/x86_64/fpu/svml_s_sinf_data.S: New file.
* sysdeps/x86_64/fpu/svml_s_sinf_data.h: New file.
* sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Vector sinf tests.
* sysdeps/x86_64/fpu/test-float-vlen16.c: Likewise.
* sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise.
* sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise.
* sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise.
* NEWS: Mention addition of x86_64 vector sinf.
diff --git a/ChangeLog b/ChangeLog
index fdcc2e9..5e93d9e 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,34 @@
+2015-06-15 Andrew Senkevich <andrew.senkevich@intel.com>
+
+ * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added.
+ * sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration for sinf.
+ * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files.
+ * sysdeps/x86_64/fpu/Versions: New versions added.
+ * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
+ * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added
+ build of SSE, AVX2 and AVX512 IFUNC versions.
+ * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S: New file.
+ * sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S: New file.
+ * sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S: New file.
+ * sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core_sse4.S: New file.
+ * sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S: New file.
+ * sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core_avx2.S: New file.
+ * sysdeps/x86_64/fpu/svml_s_sinf16_core.S: New file.
+ * sysdeps/x86_64/fpu/svml_s_sinf4_core.S: New file.
+ * sysdeps/x86_64/fpu/svml_s_sinf8_core.S: New file.
+ * sysdeps/x86_64/fpu/svml_s_sinf8_core_avx.S: New file.
+ * sysdeps/x86_64/fpu/svml_s_sinf_data.S: New file.
+ * sysdeps/x86_64/fpu/svml_s_sinf_data.h: New file.
+ * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Vector sinf tests.
+ * sysdeps/x86_64/fpu/test-float-vlen16.c: Likewise.
+ * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise.
+ * sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise.
+ * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise.
+ * sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise.
+ * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise.
+ * sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise.
+ * NEWS: Mention addition of x86_64 vector sinf.
+
2015-06-14 Joseph Myers <joseph@codesourcery.com>
* conform/list-header-symbols.pl (%extra_syms): Add in6addr_any
diff --git a/NEWS b/NEWS
index 1f81c7d..33cba7b 100644
--- a/NEWS
+++ b/NEWS
@@ -53,7 +53,7 @@ Version 2.22
condition in some applications.
* Added vector math library named libmvec with the following vectorized x86_64
- implementations: cos, cosf, sin.
+ implementations: cos, cosf, sin, sinf.
The library can be disabled with --disable-mathvec. Use of the functions is
enabled with -fopenmp -ffast-math starting from -O1 for GCC version >= 4.9.0.
The library is linked in as needed when using -lm (no need to specify -lmvec
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index 1dddacd..dcf9c7d 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -3,12 +3,16 @@ GLIBC_2.22
_ZGVbN2v_cos F
_ZGVbN2v_sin F
_ZGVbN4v_cosf F
+ _ZGVbN4v_sinf F
_ZGVcN4v_cos F
_ZGVcN4v_sin F
_ZGVcN8v_cosf F
+ _ZGVcN8v_sinf F
_ZGVdN4v_cos F
_ZGVdN4v_sin F
_ZGVdN8v_cosf F
+ _ZGVdN8v_sinf F
_ZGVeN16v_cosf F
+ _ZGVeN16v_sinf F
_ZGVeN8v_cos F
_ZGVeN8v_sin F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index 82b7c67..2b739c5 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -34,5 +34,7 @@
# define __DECL_SIMD_cosf __DECL_SIMD_x86_64
# undef __DECL_SIMD_sin
# define __DECL_SIMD_sin __DECL_SIMD_x86_64
+# undef __DECL_SIMD_sinf
+# define __DECL_SIMD_sinf __DECL_SIMD_x86_64
# endif
#endif
diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
index 25f8e33..b6ecbc3 100644
--- a/sysdeps/x86_64/fpu/Makefile
+++ b/sysdeps/x86_64/fpu/Makefile
@@ -5,6 +5,8 @@ libmvec-support += svml_d_cos2_core svml_d_cos4_core_avx \
svml_d_sin4_core svml_d_sin8_core svml_d_sin_data \
svml_s_cosf4_core svml_s_cosf8_core_avx \
svml_s_cosf8_core svml_s_cosf16_core svml_s_cosf_data \
+ svml_s_sinf4_core svml_s_sinf8_core_avx \
+ svml_s_sinf8_core svml_s_sinf16_core svml_s_sinf_data \
init-arch
endif
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index af1769c..3f3b228 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -3,5 +3,6 @@ libmvec {
_ZGVbN2v_cos; _ZGVcN4v_cos; _ZGVdN4v_cos; _ZGVeN8v_cos;
_ZGVbN2v_sin; _ZGVcN4v_sin; _ZGVdN4v_sin; _ZGVeN8v_sin;
_ZGVbN4v_cosf; _ZGVcN8v_cosf; _ZGVdN8v_cosf; _ZGVeN16v_cosf;
+ _ZGVbN4v_sinf; _ZGVcN8v_sinf; _ZGVdN8v_sinf; _ZGVeN16v_sinf;
}
}
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index d7184d8..c2b6c4d 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -1929,17 +1929,25 @@ idouble: 1
ildouble: 3
ldouble: 3
+Function: "sin_vlen16":
+float: 1
+
Function: "sin_vlen2":
double: 2
Function: "sin_vlen4":
double: 2
+float: 1
Function: "sin_vlen4_avx2":
double: 2
Function: "sin_vlen8":
double: 2
+float: 1
+
+Function: "sin_vlen8_avx2":
+float: 1
Function: "sincos":
ildouble: 1
diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64/fpu/multiarch/Makefile
index 74da4cd..61759b8 100644
--- a/sysdeps/x86_64/fpu/multiarch/Makefile
+++ b/sysdeps/x86_64/fpu/multiarch/Makefile
@@ -57,5 +57,6 @@ libmvec-sysdep_routines += svml_d_cos2_core_sse4 svml_d_cos4_core_avx2 \
svml_d_cos8_core_avx512 svml_d_sin2_core_sse4 \
svml_d_sin4_core_avx2 svml_d_sin8_core_avx512 \
svml_s_cosf4_core_sse4 svml_s_cosf8_core_avx2 \
- svml_s_cosf16_core_avx512
+ svml_s_cosf16_core_avx512 svml_s_sinf4_core_sse4 \
+ svml_s_sinf8_core_avx2 svml_s_sinf16_core_avx512
endif
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S
similarity index 50%
copy from sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
copy to sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S
index cff9941..7ed637b 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S
@@ -1,4 +1,4 @@
-/* Wrapper part of tests for AVX2 ISA versions of vector math functions.
+/* Multiple versions of vectorized sinf.
Copyright (C) 2014-2015 Free Software Foundation, Inc.
This file is part of the GNU C Library.
@@ -16,13 +16,24 @@
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
-#include "test-float-vlen8.h"
-#include "test-vec-loop.h"
-#include <immintrin.h>
+#include <sysdep.h>
+#include <init-arch.h>
-#undef VEC_SUFF
-#define VEC_SUFF _vlen8_avx2
+ .text
+ENTRY (_ZGVeN16v_sinf)
+ .type _ZGVeN16v_sinf, @gnu_indirect_function
+ cmpl $0, KIND_OFFSET+__cpu_features(%rip)
+ jne 1
+ call __init_cpu_features
+1: leaq _ZGVeN16v_sinf_skx(%rip), %rax
+ testl $bit_AVX512DQ_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512DQ_Usable(%rip)
+ jnz 3
+2: leaq _ZGVeN16v_sinf_knl(%rip), %rax
+ testl $bit_AVX512F_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512F_Usable(%rip)
+ jnz 3
+ leaq _ZGVeN16v_sinf_avx2_wrapper(%rip), %rax
+3: ret
+END (_ZGVeN16v_sinf)
-#define VEC_TYPE __m256
-
-VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVdN8v_cosf)
+#define _ZGVeN16v_sinf _ZGVeN16v_sinf_avx2_wrapper
+#include "../svml_s_sinf16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S
new file mode 100644
index 0000000..717267e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S
@@ -0,0 +1,479 @@
+/* Function sinf vectorized with AVX-512. KNL and SKX versions.
+ Copyright (C) 2014-2015 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include "svml_s_sinf_data.h"
+#include "svml_s_wrapper_impl.h"
+
+ .text
+ENTRY(_ZGVeN16v_sinf_knl)
+#ifndef HAVE_AVX512_ASM_SUPPORT
+WRAPPER_IMPL_AVX512 _ZGVdN8v_sinf
+#else
+/*
+ ALGORITHM DESCRIPTION:
+
+ 1) Range reduction to [-Pi/2; +Pi/2] interval
+ a) Grab sign from source argument and save it.
+ b) Remove sign using AND operation
+ c) Getting octant Y by 1/Pi multiplication
+ d) Add "Right Shifter" value
+ e) Treat obtained value as integer for destination sign setting.
+ Shift first bit of this value to the last (sign) position
+ f) Change destination sign if source sign is negative
+ using XOR operation.
+ g) Subtract "Right Shifter" value
+ h) Subtract Y*PI from X argument, where PI divided to 4 parts:
+ X = X - Y*PI1 - Y*PI2 - Y*PI3 - Y*PI4;
+ 2) Polynomial (minimax for sin within [-Pi/2; +Pi/2] interval)
+ a) Calculate X^2 = X * X
+ b) Calculate polynomial:
+ R = X + X * X^2 * (A3 + x^2 * (A5 + ......
+ 3) Destination sign setting
+ a) Set shifted destination sign using XOR operation:
+ R = XOR( R, S );
+ */
+ pushq %rbp
+ cfi_adjust_cfa_offset (8)
+ cfi_rel_offset (%rbp, 0)
+ movq %rsp, %rbp
+ cfi_def_cfa_register (%rbp)
+ andq $-64, %rsp
+ subq $1280, %rsp
+ movq __svml_ssin_data@GOTPCREL(%rip), %rax
+
+/* Check for large and special values */
+ movl $-1, %edx
+ vmovups __sAbsMask(%rax), %zmm4
+ vmovups __sInvPI(%rax), %zmm1
+
+/* b) Remove sign using AND operation */
+ vpandd %zmm4, %zmm0, %zmm12
+ vmovups __sPI1_FMA(%rax), %zmm2
+ vmovups __sA9(%rax), %zmm7
+
+/*
+ f) Change destination sign if source sign is negative
+ using XOR operation.
+ */
+ vpandnd %zmm0, %zmm4, %zmm11
+
+/*
+ h) Subtract Y*PI from X argument, where PI divided to 4 parts:
+ X = X - Y*PI1 - Y*PI2 - Y*PI3;
+ */
+ vmovaps %zmm12, %zmm3
+
+/*
+ c) Getting octant Y by 1/Pi multiplication
+ d) Add "Right Shifter" value
+ */
+ vfmadd213ps __sRShifter(%rax), %zmm12, %zmm1
+ vcmpps $22, __sRangeReductionVal(%rax), %zmm12, %k1
+ vpbroadcastd %edx, %zmm13{%k1}{z}
+
+/* g) Subtract "Right Shifter" value */
+ vsubps __sRShifter(%rax), %zmm1, %zmm5
+
+/*
+ e) Treat obtained value as integer for destination sign setting.
+ Shift first bit of this value to the last (sign) position
+ */
+ vpslld $31, %zmm1, %zmm6
+ vptestmd %zmm13, %zmm13, %k0
+ vfnmadd231ps %zmm5, %zmm2, %zmm3
+ kmovw %k0, %ecx
+ vfnmadd231ps __sPI2_FMA(%rax), %zmm5, %zmm3
+ vfnmadd132ps __sPI3_FMA(%rax), %zmm3, %zmm5
+
+/*
+ 2) Polynomial (minimax for sin within [-Pi/2; +Pi/2] interval)
+ a) Calculate X^2 = X * X
+ b) Calculate polynomial:
+ R = X + X * X^2 * (A3 + x^2 * (A5 + ......
+ */
+ vmulps %zmm5, %zmm5, %zmm8
+ vpxord %zmm6, %zmm5, %zmm9
+ vfmadd213ps __sA7(%rax), %zmm8, %zmm7
+ vfmadd213ps __sA5(%rax), %zmm8, %zmm7
+ vfmadd213ps __sA3(%rax), %zmm8, %zmm7
+ vmulps %zmm8, %zmm7, %zmm10
+ vfmadd213ps %zmm9, %zmm9, %zmm10
+
+/*
+ 3) Destination sign setting
+ a) Set shifted destination sign using XOR operation:
+ R = XOR( R, S );
+ */
+ vpxord %zmm11, %zmm10, %zmm1
+ testl %ecx, %ecx
+ jne .LBL_1_3
+
+.LBL_1_2:
+ cfi_remember_state
+ vmovaps %zmm1, %zmm0
+ movq %rbp, %rsp
+ cfi_def_cfa_register (%rsp)
+ popq %rbp
+ cfi_adjust_cfa_offset (-8)
+ cfi_restore (%rbp)
+ ret
+
+.LBL_1_3:
+ cfi_restore_state
+ vmovups %zmm0, 1152(%rsp)
+ vmovups %zmm1, 1216(%rsp)
+ je .LBL_1_2
+
+ xorb %dl, %dl
+ kmovw %k4, 1048(%rsp)
+ xorl %eax, %eax
+ kmovw %k5, 1040(%rsp)
+ kmovw %k6, 1032(%rsp)
+ kmovw %k7, 1024(%rsp)
+ vmovups %zmm16, 960(%rsp)
+ vmovups %zmm17, 896(%rsp)
+ vmovups %zmm18, 832(%rsp)
+ vmovups %zmm19, 768(%rsp)
+ vmovups %zmm20, 704(%rsp)
+ vmovups %zmm21, 640(%rsp)
+ vmovups %zmm22, 576(%rsp)
+ vmovups %zmm23, 512(%rsp)
+ vmovups %zmm24, 448(%rsp)
+ vmovups %zmm25, 384(%rsp)
+ vmovups %zmm26, 320(%rsp)
+ vmovups %zmm27, 256(%rsp)
+ vmovups %zmm28, 192(%rsp)
+ vmovups %zmm29, 128(%rsp)
+ vmovups %zmm30, 64(%rsp)
+ vmovups %zmm31, (%rsp)
+ movq %rsi, 1064(%rsp)
+ movq %rdi, 1056(%rsp)
+ movq %r12, 1096(%rsp)
+ cfi_offset_rel_rsp (12, 1096)
+ movb %dl, %r12b
+ movq %r13, 1088(%rsp)
+ cfi_offset_rel_rsp (13, 1088)
+ movl %ecx, %r13d
+ movq %r14, 1080(%rsp)
+ cfi_offset_rel_rsp (14, 1080)
+ movl %eax, %r14d
+ movq %r15, 1072(%rsp)
+ cfi_offset_rel_rsp (15, 1072)
+ cfi_remember_state
+
+.LBL_1_6:
+ btl %r14d, %r13d
+ jc .LBL_1_12
+
+.LBL_1_7:
+ lea 1(%r14), %esi
+ btl %esi, %r13d
+ jc .LBL_1_10
+
+.LBL_1_8:
+ addb $1, %r12b
+ addl $2, %r14d
+ cmpb $16, %r12b
+ jb .LBL_1_6
+
+ kmovw 1048(%rsp), %k4
+ movq 1064(%rsp), %rsi
+ kmovw 1040(%rsp), %k5
+ movq 1056(%rsp), %rdi
+ kmovw 1032(%rsp), %k6
+ movq 1096(%rsp), %r12
+ cfi_restore (%r12)
+ movq 1088(%rsp), %r13
+ cfi_restore (%r13)
+ kmovw 1024(%rsp), %k7
+ vmovups 960(%rsp), %zmm16
+ vmovups 896(%rsp), %zmm17
+ vmovups 832(%rsp), %zmm18
+ vmovups 768(%rsp), %zmm19
+ vmovups 704(%rsp), %zmm20
+ vmovups 640(%rsp), %zmm21
+ vmovups 576(%rsp), %zmm22
+ vmovups 512(%rsp), %zmm23
+ vmovups 448(%rsp), %zmm24
+ vmovups 384(%rsp), %zmm25
+ vmovups 320(%rsp), %zmm26
+ vmovups 256(%rsp), %zmm27
+ vmovups 192(%rsp), %zmm28
+ vmovups 128(%rsp), %zmm29
+ vmovups 64(%rsp), %zmm30
+ vmovups (%rsp), %zmm31
+ movq 1080(%rsp), %r14
+ cfi_restore (%r14)
+ movq 1072(%rsp), %r15
+ cfi_restore (%r15)
+ vmovups 1216(%rsp), %zmm1
+ jmp .LBL_1_2
+
+.LBL_1_10:
+ cfi_restore_state
+ movzbl %r12b, %r15d
+ vmovss 1156(%rsp,%r15,8), %xmm0
+ call sinf@PLT
+ vmovss %xmm0, 1220(%rsp,%r15,8)
+ jmp .LBL_1_8
+
+.LBL_1_12:
+ movzbl %r12b, %r15d
+ vmovss 1152(%rsp,%r15,8), %xmm0
+ call sinf@PLT
+ vmovss %xmm0, 1216(%rsp,%r15,8)
+ jmp .LBL_1_7
+#endif
+END(_ZGVeN16v_sinf_knl)
+
+ENTRY (_ZGVeN16v_sinf_skx)
+#ifndef HAVE_AVX512_ASM_SUPPORT
+WRAPPER_IMPL_AVX512 _ZGVdN8v_sinf
+#else
+/*
+ ALGORITHM DESCRIPTION:
+
+ 1) Range reduction to [-Pi/2; +Pi/2] interval
+ a) Grab sign from source argument and save it.
+ b) Remove sign using AND operation
+ c) Getting octant Y by 1/Pi multiplication
+ d) Add "Right Shifter" value
+ e) Treat obtained value as integer for destination sign setting.
+ Shift first bit of this value to the last (sign) position
+ f) Change destination sign if source sign is negative
+ using XOR operation.
+ g) Subtract "Right Shifter" value
+ h) Subtract Y*PI from X argument, where PI divided to 4 parts:
+ X = X - Y*PI1 - Y*PI2 - Y*PI3 - Y*PI4;
+ 2) Polynomial (minimax for sin within [-Pi/2; +Pi/2] interval)
+ a) Calculate X^2 = X * X
+ b) Calculate polynomial:
+ R = X + X * X^2 * (A3 + x^2 * (A5 + ......
+ 3) Destination sign setting
+ a) Set shifted destination sign using XOR operation:
+ R = XOR( R, S );
+ */
+
+ pushq %rbp
+ cfi_adjust_cfa_offset (8)
+ cfi_rel_offset (%rbp, 0)
+ movq %rsp, %rbp
+ cfi_def_cfa_register (%rbp)
+ andq $-64, %rsp
+ subq $1280, %rsp
+ movq __svml_ssin_data@GOTPCREL(%rip), %rax
+
+/* Check for large and special values */
+ vmovups .L_2il0floatpacket.11(%rip), %zmm14
+ vmovups __sAbsMask(%rax), %zmm5
+ vmovups __sInvPI(%rax), %zmm1
+ vmovups __sRShifter(%rax), %zmm2
+ vmovups __sPI1_FMA(%rax), %zmm3
+ vmovups __sA9(%rax), %zmm8
+
+/* b) Remove sign using AND operation */
+ vandps %zmm5, %zmm0, %zmm13
+
+/*
+ f) Change destination sign if source sign is negative
+ using XOR operation.
+ */
+ vandnps %zmm0, %zmm5, %zmm12
+
+/*
+ c) Getting octant Y by 1/Pi multiplication
+ d) Add "Right Shifter" value
+ */
+ vfmadd213ps %zmm2, %zmm13, %zmm1
+ vcmpps $18, __sRangeReductionVal(%rax), %zmm13, %k1
+
+/*
+ e) Treat obtained value as integer for destination sign setting.
+ Shift first bit of this value to the last (sign) position
+ */
+ vpslld $31, %zmm1, %zmm7
+
+/* g) Subtract "Right Shifter" value */
+ vsubps %zmm2, %zmm1, %zmm6
+
+/*
+ h) Subtract Y*PI from X argument, where PI divided to 4 parts:
+ X = X - Y*PI1 - Y*PI2 - Y*PI3;
+ */
+ vmovaps %zmm13, %zmm4
+ vfnmadd231ps %zmm6, %zmm3, %zmm4
+ vfnmadd231ps __sPI2_FMA(%rax), %zmm6, %zmm4
+ vfnmadd132ps __sPI3_FMA(%rax), %zmm4, %zmm6
+
+/*
+ 2) Polynomial (minimax for sin within [-Pi/2; +Pi/2] interval)
+ a) Calculate X^2 = X * X
+ b) Calculate polynomial:
+ R = X + X * X^2 * (A3 + x^2 * (A5 + ......
+ */
+ vmulps %zmm6, %zmm6, %zmm9
+ vxorps %zmm7, %zmm6, %zmm10
+ vfmadd213ps __sA7(%rax), %zmm9, %zmm8
+ vfmadd213ps __sA5(%rax), %zmm9, %zmm8
+ vfmadd213ps __sA3(%rax), %zmm9, %zmm8
+ vmulps %zmm9, %zmm8, %zmm11
+ vfmadd213ps %zmm10, %zmm10, %zmm11
+
+/*
+ 3) Destination sign setting
+ a) Set shifted destination sign using XOR operation:
+ R = XOR( R, S );
+ */
+ vxorps %zmm12, %zmm11, %zmm1
+ vpandnd %zmm13, %zmm13, %zmm14{%k1}
+ vptestmd %zmm14, %zmm14, %k0
+ kmovw %k0, %ecx
+ testl %ecx, %ecx
+ jne .LBL_2_3
+
+.LBL_2_2:
+ cfi_remember_state
+ vmovaps %zmm1, %zmm0
+ movq %rbp, %rsp
+ cfi_def_cfa_register (%rsp)
+ popq %rbp
+ cfi_adjust_cfa_offset (-8)
+ cfi_restore (%rbp)
+ ret
+
+.LBL_2_3:
+ cfi_restore_state
+ vmovups %zmm0, 1152(%rsp)
+ vmovups %zmm1, 1216(%rsp)
+ je .LBL_2_2
+
+ xorb %dl, %dl
+ xorl %eax, %eax
+ kmovw %k4, 1048(%rsp)
+ kmovw %k5, 1040(%rsp)
+ kmovw %k6, 1032(%rsp)
+ kmovw %k7, 1024(%rsp)
+ vmovups %zmm16, 960(%rsp)
+ vmovups %zmm17, 896(%rsp)
+ vmovups %zmm18, 832(%rsp)
+ vmovups %zmm19, 768(%rsp)
+ vmovups %zmm20, 704(%rsp)
+ vmovups %zmm21, 640(%rsp)
+ vmovups %zmm22, 576(%rsp)
+ vmovups %zmm23, 512(%rsp)
+ vmovups %zmm24, 448(%rsp)
+ vmovups %zmm25, 384(%rsp)
+ vmovups %zmm26, 320(%rsp)
+ vmovups %zmm27, 256(%rsp)
+ vmovups %zmm28, 192(%rsp)
+ vmovups %zmm29, 128(%rsp)
+ vmovups %zmm30, 64(%rsp)
+ vmovups %zmm31, (%rsp)
+ movq %rsi, 1064(%rsp)
+ movq %rdi, 1056(%rsp)
+ movq %r12, 1096(%rsp)
+ cfi_offset_rel_rsp (12, 1096)
+ movb %dl, %r12b
+ movq %r13, 1088(%rsp)
+ cfi_offset_rel_rsp (13, 1088)
+ movl %ecx, %r13d
+ movq %r14, 1080(%rsp)
+ cfi_offset_rel_rsp (14, 1080)
+ movl %eax, %r14d
+ movq %r15, 1072(%rsp)
+ cfi_offset_rel_rsp (15, 1072)
+ cfi_remember_state
+
+.LBL_2_6:
+ btl %r14d, %r13d
+ jc .LBL_2_12
+
+.LBL_2_7:
+ lea 1(%r14), %esi
+ btl %esi, %r13d
+ jc .LBL_2_10
+
+.LBL_2_8:
+ incb %r12b
+ addl $2, %r14d
+ cmpb $16, %r12b
+ jb .LBL_2_6
+
+ kmovw 1048(%rsp), %k4
+ kmovw 1040(%rsp), %k5
+ kmovw 1032(%rsp), %k6
+ kmovw 1024(%rsp), %k7
+ vmovups 960(%rsp), %zmm16
+ vmovups 896(%rsp), %zmm17
+ vmovups 832(%rsp), %zmm18
+ vmovups 768(%rsp), %zmm19
+ vmovups 704(%rsp), %zmm20
+ vmovups 640(%rsp), %zmm21
+ vmovups 576(%rsp), %zmm22
+ vmovups 512(%rsp), %zmm23
+ vmovups 448(%rsp), %zmm24
+ vmovups 384(%rsp), %zmm25
+ vmovups 320(%rsp), %zmm26
+ vmovups 256(%rsp), %zmm27
+ vmovups 192(%rsp), %zmm28
+ vmovups 128(%rsp), %zmm29
+ vmovups 64(%rsp), %zmm30
+ vmovups (%rsp), %zmm31
+ vmovups 1216(%rsp), %zmm1
+ movq 1064(%rsp), %rsi
+ movq 1056(%rsp), %rdi
+ movq 1096(%rsp), %r12
+ cfi_restore (%r12)
+ movq 1088(%rsp), %r13
+ cfi_restore (%r13)
+ movq 1080(%rsp), %r14
+ cfi_restore (%r14)
+ movq 1072(%rsp), %r15
+ cfi_restore (%r15)
+ jmp .LBL_2_2
+
+.LBL_2_10:
+ cfi_restore_state
+ movzbl %r12b, %r15d
+ vmovss 1156(%rsp,%r15,8), %xmm0
+ vzeroupper
+ vmovss 1156(%rsp,%r15,8), %xmm0
+
+ call sinf@PLT
+
+ vmovss %xmm0, 1220(%rsp,%r15,8)
+ jmp .LBL_2_8
+
+.LBL_2_12:
+ movzbl %r12b, %r15d
+ vmovss 1152(%rsp,%r15,8), %xmm0
+ vzeroupper
+ vmovss 1152(%rsp,%r15,8), %xmm0
+
+ call sinf@PLT
+
+ vmovss %xmm0, 1216(%rsp,%r15,8)
+ jmp .LBL_2_7
+#endif
+END (_ZGVeN16v_sinf_skx)
+
+ .section .rodata, "a"
+.L_2il0floatpacket.11:
+ .long 0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff
+ .type .L_2il0floatpacket.11,@object
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S
similarity index 56%
copy from sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
copy to sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S
index 05d6a40..cf1e4df 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S
@@ -1,4 +1,4 @@
-/* Wrapper part of tests for SSE ISA versions of vector math functions.
+/* Multiple versions of vectorized sinf.
Copyright (C) 2014-2015 Free Software Foundation, Inc.
This file is part of the GNU C Library.
@@ -16,10 +16,23 @@
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
-#include "test-float-vlen4.h"
-#include "test-vec-loop.h"
-#include <immintrin.h>
+#include <sysdep.h>
+#include <init-arch.h>
-#define VEC_TYPE __m128
+ .text
+ENTRY (_ZGVbN4v_sinf)
+ .type _ZGVbN4v_sinf, @gnu_indirect_function
+ cmpl $0, KIND_OFFSET+__cpu_features(%rip)
+ jne 1f
+ call __init_cpu_features
+1: leaq _ZGVbN4v_sinf_sse4(%rip), %rax
+ testl $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+ jz 2f
+ ret
+2: leaq _ZGVbN4v_sinf_sse2(%rip), %rax
+ ret
+END (_ZGVbN4v_sinf)
+libmvec_hidden_def (_ZGVbN4v_sinf)
-VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVbN4v_cosf)
+#define _ZGVbN4v_sinf _ZGVbN4v_sinf_sse2
+#include "../svml_s_sinf4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core_sse4.S
new file mode 100644
index 0000000..746e3ef
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core_sse4.S
@@ -0,0 +1,224 @@
+/* Function sinf vectorized with SSE4.
+ Copyright (C) 2014-2015 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+
+#include <sysdep.h>
+#include "svml_s_sinf_data.h"
+
+ .text
+ENTRY(_ZGVbN4v_sinf_sse4)
+/*
+ ALGORITHM DESCRIPTION:
+
+ 1) Range reduction to [-Pi/2; +Pi/2] interval
+ a) Grab sign from source argument and save it.
+ b) Remove sign using AND operation
+ c) Getting octant Y by 1/Pi multiplication
+ d) Add "Right Shifter" value
+ e) Treat obtained value as integer for destination sign setting.
+ Shift first bit of this value to the last (sign) position
+ f) Change destination sign if source sign is negative
+ using XOR operation.
+ g) Subtract "Right Shifter" value
+ h) Subtract Y*PI from X argument, where PI divided to 4 parts:
+ X = X - Y*PI1 - Y*PI2 - Y*PI3 - Y*PI4;
+ 2) Polynomial (minimax for sin within [-Pi/2; +Pi/2] interval)
+ a) Calculate X^2 = X * X
+ b) Calculate polynomial:
+ R = X + X * X^2 * (A3 + x^2 * (A5 + ......
+ 3) Destination sign setting
+ a) Set shifted destination sign using XOR operation:
+ R = XOR( R, S );
+ */
+ pushq %rbp
+ cfi_adjust_cfa_offset (8)
+ cfi_rel_offset (%rbp, 0)
+ movq %rsp, %rbp
+ cfi_def_cfa_register (%rbp)
+ andq $-64, %rsp
+ subq $320, %rsp
+ movaps %xmm0, %xmm5
+ movq __svml_ssin_data@GOTPCREL(%rip), %rax
+ movups __sAbsMask(%rax), %xmm2
+
+/* b) Remove sign using AND operation */
+ movaps %xmm2, %xmm4
+
+/*
+ f) Change destination sign if source sign is negative
+ using XOR operation.
+ */
+ andnps %xmm5, %xmm2
+ movups __sInvPI(%rax), %xmm1
+ andps %xmm5, %xmm4
+
+/* c) Getting octant Y by 1/Pi multiplication
+ d) Add "Right Shifter" value */
+ mulps %xmm4, %xmm1
+
+/* h) Subtract Y*PI from X argument, where PI divided to 4 parts:
+ X = X - Y*PI1 - Y*PI2 - Y*PI3 - Y*PI4 */
+ movaps %xmm4, %xmm0
+
+/* Check for large and special values */
+ cmpnleps __sRangeReductionVal(%rax), %xmm4
+ movups __sRShifter(%rax), %xmm6
+ movups __sPI1(%rax), %xmm7
+ addps %xmm6, %xmm1
+ movmskps %xmm4, %ecx
+
+/* e) Treat obtained value as integer for destination sign setting.
+ Shift first bit of this value to the last (sign) position */
+ movaps %xmm1, %xmm3
+
+/* g) Subtract "Right Shifter" value */
+ subps %xmm6, %xmm1
+ mulps %xmm1, %xmm7
+ pslld $31, %xmm3
+ movups __sPI2(%rax), %xmm6
+ subps %xmm7, %xmm0
+ mulps %xmm1, %xmm6
+ movups __sPI3(%rax), %xmm7
+ subps %xmm6, %xmm0
+ mulps %xmm1, %xmm7
+ movups __sPI4(%rax), %xmm6
+ subps %xmm7, %xmm0
+ mulps %xmm6, %xmm1
+ subps %xmm1, %xmm0
+
+/* 2) Polynomial (minimax for sin within [-Pi/2; +Pi/2] interval)
+ a) Calculate X^2 = X * X
+ b) Calculate polynomial:
+ R = X + X * X^2 * (A3 + x^2 * (A5 + ...... */
+ movaps %xmm0, %xmm1
+ mulps %xmm0, %xmm1
+ xorps %xmm3, %xmm0
+ movups __sA9(%rax), %xmm3
+ mulps %xmm1, %xmm3
+ addps __sA7(%rax), %xmm3
+ mulps %xmm1, %xmm3
+ addps __sA5(%rax), %xmm3
+ mulps %xmm1, %xmm3
+ addps __sA3(%rax), %xmm3
+ mulps %xmm3, %xmm1
+ mulps %xmm0, %xmm1
+ addps %xmm1, %xmm0
+
+/* 3) Destination sign setting
+ a) Set shifted destination sign using XOR operation:
+ R = XOR( R, S ); */
+ xorps %xmm2, %xmm0
+ testl %ecx, %ecx
+ jne .LBL_1_3
+
+.LBL_1_2:
+ cfi_remember_state
+ movq %rbp, %rsp
+ cfi_def_cfa_register (%rsp)
+ popq %rbp
+ cfi_adjust_cfa_offset (-8)
+ cfi_restore (%rbp)
+ ret
+
+.LBL_1_3:
+ cfi_restore_state
+ movups %xmm5, 192(%rsp)
+ movups %xmm0, 256(%rsp)
+ je .LBL_1_2
+
+ xorb %dl, %dl
+ xorl %eax, %eax
+ movups %xmm8, 112(%rsp)
+ movups %xmm9, 96(%rsp)
+ movups %xmm10, 80(%rsp)
+ movups %xmm11, 64(%rsp)
+ movups %xmm12, 48(%rsp)
+ movups %xmm13, 32(%rsp)
+ movups %xmm14, 16(%rsp)
+ movups %xmm15, (%rsp)
+ movq %rsi, 136(%rsp)
+ movq %rdi, 128(%rsp)
+ movq %r12, 168(%rsp)
+ cfi_offset_rel_rsp (12, 168)
+ movb %dl, %r12b
+ movq %r13, 160(%rsp)
+ cfi_offset_rel_rsp (13, 160)
+ movl %ecx, %r13d
+ movq %r14, 152(%rsp)
+ cfi_offset_rel_rsp (14, 152)
+ movl %eax, %r14d
+ movq %r15, 144(%rsp)
+ cfi_offset_rel_rsp (15, 144)
+ cfi_remember_state
+
+.LBL_1_6:
+ btl %r14d, %r13d
+ jc .LBL_1_12
+
+.LBL_1_7:
+ lea 1(%r14), %esi
+ btl %esi, %r13d
+ jc .LBL_1_10
+
+.LBL_1_8:
+ incb %r12b
+ addl $2, %r14d
+ cmpb $16, %r12b
+ jb .LBL_1_6
+
+ movups 112(%rsp), %xmm8
+ movups 96(%rsp), %xmm9
+ movups 80(%rsp), %xmm10
+ movups 64(%rsp), %xmm11
+ movups 48(%rsp), %xmm12
+ movups 32(%rsp), %xmm13
+ movups 16(%rsp), %xmm14
+ movups (%rsp), %xmm15
+ movq 136(%rsp), %rsi
+ movq 128(%rsp), %rdi
+ movq 168(%rsp), %r12
+ cfi_restore (%r12)
+ movq 160(%rsp), %r13
+ cfi_restore (%r13)
+ movq 152(%rsp), %r14
+ cfi_restore (%r14)
+ movq 144(%rsp), %r15
+ cfi_restore (%r15)
+ movups 256(%rsp), %xmm0
+ jmp .LBL_1_2
+
+.LBL_1_10:
+ cfi_restore_state
+ movzbl %r12b, %r15d
+ movss 196(%rsp,%r15,8), %xmm0
+
+ call sinf@PLT
+
+ movss %xmm0, 260(%rsp,%r15,8)
+ jmp .LBL_1_8
+
+.LBL_1_12:
+ movzbl %r12b, %r15d
+ movss 192(%rsp,%r15,8), %xmm0
+
+ call sinf@PLT
+
+ movss %xmm0, 256(%rsp,%r15,8)
+ jmp .LBL_1_7
+
+END(_ZGVbN4v_sinf_sse4)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S
similarity index 50%
copy from sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
copy to sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S
index cff9941..b28bf3c 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S
@@ -1,4 +1,4 @@
-/* Wrapper part of tests for AVX2 ISA versions of vector math functions.
+/* Multiple versions of vectorized sinf, vector length is 8.
Copyright (C) 2014-2015 Free Software Foundation, Inc.
This file is part of the GNU C Library.
@@ -10,19 +10,29 @@
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- Lesser General Public License for more details.
+ Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
-#include "test-float-vlen8.h"
-#include "test-vec-loop.h"
-#include <immintrin.h>
+#include <sysdep.h>
+#include <init-arch.h>
-#undef VEC_SUFF
-#define VEC_SUFF _vlen8_avx2
+ .text
+ENTRY (_ZGVdN8v_sinf)
+ .type _ZGVdN8v_sinf, @gnu_indirect_function
+ cmpl $0, KIND_OFFSET+__cpu_features(%rip)
+ jne 1f
+ call __init_cpu_features
+1: leaq _ZGVdN8v_sinf_avx2(%rip), %rax
+ testl $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+ jz 2f
+ ret
+2: leaq _ZGVdN8v_sinf_sse_wrapper(%rip), %rax
+ ret
+END (_ZGVdN8v_sinf)
+libmvec_hidden_def (_ZGVdN8v_sinf)
-#define VEC_TYPE __m256
-
-VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVdN8v_cosf)
+#define _ZGVdN8v_sinf _ZGVdN8v_sinf_sse_wrapper
+#include "../svml_s_sinf8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core_avx2.S
new file mode 100644
index 0000000..aea4cdd
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core_avx2.S
@@ -0,0 +1,219 @@
+/* Function sinf vectorized with AVX2.
+ Copyright (C) 2014-2015 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include "svml_s_sinf_data.h"
+
+ .text
+ENTRY(_ZGVdN8v_sinf_avx2)
+/*
+ ALGORITHM DESCRIPTION:
+
+ 1) Range reduction to [-Pi/2; +Pi/2] interval
+ a) Grab sign from source argument and save it.
+ b) Remove sign using AND operation
+ c) Getting octant Y by 1/Pi multiplication
+ d) Add "Right Shifter" value
+ e) Treat obtained value as integer for destination sign setting.
+ Shift first bit of this value to the last (sign) position
+ f) Change destination sign if source sign is negative
+ using XOR operation.
+ g) Subtract "Right Shifter" value
+ h) Subtract Y*PI from X argument, where PI divided to 4 parts:
+ X = X - Y*PI1 - Y*PI2 - Y*PI3 - Y*PI4;
+ 2) Polynomial (minimax for sin within [-Pi/2; +Pi/2] interval)
+ a) Calculate X^2 = X * X
+ b) Calculate polynomial:
+ R = X + X * X^2 * (A3 + x^2 * (A5 + ......
+ 3) Destination sign setting
+ a) Set shifted destination sign using XOR operation:
+ R = XOR( R, S );
+ */
+ pushq %rbp
+ cfi_adjust_cfa_offset (8)
+ cfi_rel_offset (%rbp, 0)
+ movq %rsp, %rbp
+ cfi_def_cfa_register (%rbp)
+ andq $-64, %rsp
+ subq $448, %rsp
+ movq __svml_ssin_data@GOTPCREL(%rip), %rax
+ vmovdqa %ymm0, %ymm5
+ vmovups __sAbsMask(%rax), %ymm3
+ vmovups __sInvPI(%rax), %ymm7
+ vmovups __sRShifter(%rax), %ymm0
+ vmovups __sPI1_FMA(%rax), %ymm1
+
+/* b) Remove sign using AND operation */
+ vandps %ymm3, %ymm5, %ymm4
+
+/*
+ c) Getting octant Y by 1/Pi multiplication
+ d) Add "Right Shifter" value
+ */
+ vfmadd213ps %ymm0, %ymm4, %ymm7
+
+/* g) Subtract "Right Shifter" value */
+ vsubps %ymm0, %ymm7, %ymm2
+
+/*
+ e) Treat obtained value as integer for destination sign setting.
+ Shift first bit of this value to the last (sign) position
+ */
+ vpslld $31, %ymm7, %ymm6
+
+/*
+ h) Subtract Y*PI from X argument, where PI divided to 4 parts:
+ X = X - Y*PI1 - Y*PI2 - Y*PI3;
+ */
+ vmovdqa %ymm4, %ymm0
+ vfnmadd231ps %ymm2, %ymm1, %ymm0
+
+/* Check for large and special values */
+ vcmpnle_uqps __sRangeReductionVal(%rax), %ymm4, %ymm4
+ vfnmadd231ps __sPI2_FMA(%rax), %ymm2, %ymm0
+ vfnmadd132ps __sPI3_FMA(%rax), %ymm0, %ymm2
+
+/*
+ 2) Polynomial (minimax for sin within [-Pi/2; +Pi/2] interval)
+ a) Calculate X^2 = X * X
+ b) Calculate polynomial:
+ R = X + X * X^2 * (A3 + x^2 * (A5 + ......
+ */
+ vmulps %ymm2, %ymm2, %ymm1
+
+/*
+ f) Change destination sign if source sign is negative
+ using XOR operation.
+ */
+ vandnps %ymm5, %ymm3, %ymm0
+ vxorps %ymm6, %ymm2, %ymm3
+ vmovups __sA9(%rax), %ymm2
+ vfmadd213ps __sA7(%rax), %ymm1, %ymm2
+ vfmadd213ps __sA5(%rax), %ymm1, %ymm2
+ vfmadd213ps __sA3(%rax), %ymm1, %ymm2
+ vmulps %ymm1, %ymm2, %ymm6
+ vfmadd213ps %ymm3, %ymm3, %ymm6
+ vmovmskps %ymm4, %ecx
+
+/*
+ 3) Destination sign setting
+ a) Set shifted destination sign using XOR operation:
+ R = XOR( R, S );
+ */
+ vxorps %ymm0, %ymm6, %ymm0
+ testl %ecx, %ecx
+ jne .LBL_1_3
+
+.LBL_1_2:
+ cfi_remember_state
+ movq %rbp, %rsp
+ cfi_def_cfa_register (%rsp)
+ popq %rbp
+ cfi_adjust_cfa_offset (-8)
+ cfi_restore (%rbp)
+ ret
+
+.LBL_1_3:
+ cfi_restore_state
+ vmovups %ymm5, 320(%rsp)
+ vmovups %ymm0, 384(%rsp)
+ je .LBL_1_2
+
+ xorb %dl, %dl
+ xorl %eax, %eax
+ vmovups %ymm8, 224(%rsp)
+ vmovups %ymm9, 192(%rsp)
+ vmovups %ymm10, 160(%rsp)
+ vmovups %ymm11, 128(%rsp)
+ vmovups %ymm12, 96(%rsp)
+ vmovups %ymm13, 64(%rsp)
+ vmovups %ymm14, 32(%rsp)
+ vmovups %ymm15, (%rsp)
+ movq %rsi, 264(%rsp)
+ movq %rdi, 256(%rsp)
+ movq %r12, 296(%rsp)
+ cfi_offset_rel_rsp (12, 296)
+ movb %dl, %r12b
+ movq %r13, 288(%rsp)
+ cfi_offset_rel_rsp (13, 288)
+ movl %ecx, %r13d
+ movq %r14, 280(%rsp)
+ cfi_offset_rel_rsp (14, 280)
+ movl %eax, %r14d
+ movq %r15, 272(%rsp)
+ cfi_offset_rel_rsp (15, 272)
+ cfi_remember_state
+
+.LBL_1_6:
+ btl %r14d, %r13d
+ jc .LBL_1_12
+
+.LBL_1_7:
+ lea 1(%r14), %esi
+ btl %esi, %r13d
+ jc .LBL_1_10
+
+.LBL_1_8:
+ incb %r12b
+ addl $2, %r14d
+ cmpb $16, %r12b
+ jb .LBL_1_6
+
+ vmovups 224(%rsp), %ymm8
+ vmovups 192(%rsp), %ymm9
+ vmovups 160(%rsp), %ymm10
+ vmovups 128(%rsp), %ymm11
+ vmovups 96(%rsp), %ymm12
+ vmovups 64(%rsp), %ymm13
+ vmovups 32(%rsp), %ymm14
+ vmovups (%rsp), %ymm15
+ vmovups 384(%rsp), %ymm0
+ movq 264(%rsp), %rsi
+ movq 256(%rsp), %rdi
+ movq 296(%rsp), %r12
+ cfi_restore (%r12)
+ movq 288(%rsp), %r13
+ cfi_restore (%r13)
+ movq 280(%rsp), %r14
+ cfi_restore (%r14)
+ movq 272(%rsp), %r15
+ cfi_restore (%r15)
+ jmp .LBL_1_2
+
+.LBL_1_10:
+ cfi_restore_state
+ movzbl %r12b, %r15d
+ vmovss 324(%rsp,%r15,8), %xmm0
+ vzeroupper
+
+ call sinf@PLT
+
+ vmovss %xmm0, 388(%rsp,%r15,8)
+ jmp .LBL_1_8
+
+.LBL_1_12:
+ movzbl %r12b, %r15d
+ vmovss 320(%rsp,%r15,8), %xmm0
+ vzeroupper
+
+ call sinf@PLT
+
+ vmovss %xmm0, 384(%rsp,%r15,8)
+ jmp .LBL_1_7
+
+END(_ZGVdN8v_sinf_avx2)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8.c b/sysdeps/x86_64/fpu/svml_s_sinf16_core.S
similarity index 79%
copy from sysdeps/x86_64/fpu/test-float-vlen8.c
copy to sysdeps/x86_64/fpu/svml_s_sinf16_core.S
index b96dec6..add6e0f 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8.c
+++ b/sysdeps/x86_64/fpu/svml_s_sinf16_core.S
@@ -1,4 +1,4 @@
-/* Tests for AVX ISA versions of vector math functions.
+/* Function sinf vectorized with AVX-512. Wrapper to AVX2 version.
Copyright (C) 2014-2015 Free Software Foundation, Inc.
This file is part of the GNU C Library.
@@ -16,8 +16,10 @@
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
-#include "test-float-vlen8.h"
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
-#define TEST_VECTOR_cosf 1
-
-#include "libm-test.c"
+ .text
+ENTRY (_ZGVeN16v_sinf)
+WRAPPER_IMPL_AVX512 _ZGVdN8v_sinf
+END (_ZGVeN16v_sinf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c b/sysdeps/x86_64/fpu/svml_s_sinf4_core.S
similarity index 77%
copy from sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
copy to sysdeps/x86_64/fpu/svml_s_sinf4_core.S
index f0ee6f2..2349c7b 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
+++ b/sysdeps/x86_64/fpu/svml_s_sinf4_core.S
@@ -1,4 +1,4 @@
-/* Tests for AVX2 ISA versions of vector math functions.
+/* Function sinf vectorized with SSE2.
Copyright (C) 2014-2015 Free Software Foundation, Inc.
This file is part of the GNU C Library.
@@ -16,13 +16,15 @@
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
-#include "test-float-vlen8.h"
-#undef VEC_SUFF
-#define VEC_SUFF _vlen8_avx2
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
-#define TEST_VECTOR_cosf 1
+ .text
+ENTRY (_ZGVbN4v_sinf)
+WRAPPER_IMPL_SSE2 sinf
+END (_ZGVbN4v_sinf)
-#define REQUIRE_AVX2
-
-#include "libm-test.c"
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4v_sinf)
+#endif
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/svml_s_sinf8_core.S
similarity index 75%
copy from sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
copy to sysdeps/x86_64/fpu/svml_s_sinf8_core.S
index 05d6a40..fe31e37 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/svml_s_sinf8_core.S
@@ -1,4 +1,4 @@
-/* Wrapper part of tests for SSE ISA versions of vector math functions.
+/* Function sinf vectorized with AVX2, wrapper version.
Copyright (C) 2014-2015 Free Software Foundation, Inc.
This file is part of the GNU C Library.
@@ -16,10 +16,14 @@
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
-#include "test-float-vlen4.h"
-#include "test-vec-loop.h"
-#include <immintrin.h>
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
-#define VEC_TYPE __m128
+ .text
+ENTRY (_ZGVdN8v_sinf)
+WRAPPER_IMPL_AVX _ZGVbN4v_sinf
+END (_ZGVdN8v_sinf)
-VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVbN4v_cosf)
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8v_sinf)
+#endif
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8.c b/sysdeps/x86_64/fpu/svml_s_sinf8_core_avx.S
similarity index 79%
copy from sysdeps/x86_64/fpu/test-float-vlen8.c
copy to sysdeps/x86_64/fpu/svml_s_sinf8_core_avx.S
index b96dec6..f54be48 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8.c
+++ b/sysdeps/x86_64/fpu/svml_s_sinf8_core_avx.S
@@ -1,4 +1,4 @@
-/* Tests for AVX ISA versions of vector math functions.
+/* Function sinf vectorized in AVX ISA as wrapper to SSE4 ISA version.
Copyright (C) 2014-2015 Free Software Foundation, Inc.
This file is part of the GNU C Library.
@@ -16,8 +16,10 @@
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
-#include "test-float-vlen8.h"
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
-#define TEST_VECTOR_cosf 1
-
-#include "libm-test.c"
+ .text
+ENTRY(_ZGVcN8v_sinf)
+WRAPPER_IMPL_AVX _ZGVbN4v_sinf
+END(_ZGVcN8v_sinf)
diff --git a/sysdeps/x86_64/fpu/svml_s_sinf_data.S b/sysdeps/x86_64/fpu/svml_s_sinf_data.S
new file mode 100644
index 0000000..3a25e0b
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_sinf_data.S
@@ -0,0 +1,1118 @@
+/* Data for function sinf.
+ Copyright (C) 2014-2015 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include "svml_s_sinf_data.h"
+
+ .section .rodata, "a"
+ .align 64
+
+/* Data table for vector implementations of function sinf.
+ The table may contain polynomial, reduction, lookup coefficients and other macro_names
+ obtained through different methods of research and experimental work. */
+
+ .globl __svml_ssin_data
+__svml_ssin_data:
+
+/* Lookup table for high accuracy version (CHL,SHi,SLo,Sigma). */
+.if .-__svml_ssin_data != __dT
+.err
+.endif
+ .long 0x00000000
+ .long 0x00000000
+ .long 0x00000000
+ .long 0x3f800000
+ .long 0xb99de7df
+ .long 0x3cc90ab0
+ .long 0xb005c998
+ .long 0x3f800000
+ .long 0xba9de1c8
+ .long 0x3d48fb30
+ .long 0xb0ef227f
+ .long 0x3f800000
+ .long 0xbb319298
+ .long 0x3d96a905
+ .long 0xb1531e61
+ .long 0x3f800000
+ .long 0xbb9dc971
+ .long 0x3dc8bd36
+ .long 0xb07592f5
+ .long 0x3f800000
+ .long 0xbbf66e3c
+ .long 0x3dfab273
+ .long 0xb11568cf
+ .long 0x3f800000
+ .long 0xbc315502
+ .long 0x3e164083
+ .long 0x31e8e614
+ .long 0x3f800000
+ .long 0xbc71360b
+ .long 0x3e2f10a2
+ .long 0x311167f9
+ .long 0x3f800000
+ .long 0xbc9d6830
+ .long 0x3e47c5c2
+ .long 0xb0e5967d
+ .long 0x3f800000
+ .long 0xbcc70c54
+ .long 0x3e605c13
+ .long 0x31a7e4f6
+ .long 0x3f800000
+ .long 0xbcf58104
+ .long 0x3e78cfcc
+ .long 0xb11bd41d
+ .long 0x3f800000
+ .long 0xbd145f8c
+ .long 0x3e888e93
+ .long 0x312c7d9e
+ .long 0x3f800000
+ .long 0xbd305f55
+ .long 0x3e94a031
+ .long 0x326d59f0
+ .long 0x3f800000
+ .long 0xbd4ebb8a
+ .long 0x3ea09ae5
+ .long 0xb23e89a0
+ .long 0x3f800000
+ .long 0xbd6f6f7e
+ .long 0x3eac7cd4
+ .long 0xb2254e02
+ .long 0x3f800000
+ .long 0xbd893b12
+ .long 0x3eb8442a
+ .long 0xb2705ba6
+ .long 0x3f800000
+ .long 0xbd9be50c
+ .long 0x3ec3ef15
+ .long 0x31d5d52c
+ .long 0x3f800000
+ .long 0xbdafb2cc
+ .long 0x3ecf7bca
+ .long 0x316a3b63
+ .long 0x3f800000
+ .long 0xbdc4a143
+ .long 0x3edae880
+ .long 0x321e15cc
+ .long 0x3f800000
+ .long 0xbddaad38
+ .long 0x3ee63375
+ .long 0xb1d9c774
+ .long 0x3f800000
+ .long 0xbdf1d344
+ .long 0x3ef15aea
+ .long 0xb1ff2139
+ .long 0x3f800000
+ .long 0xbe0507ea
+ .long 0x3efc5d27
+ .long 0xb180eca9
+ .long 0x3f800000
+ .long 0xbe11af97
+ .long 0x3f039c3d
+ .long 0xb25ba002
+ .long 0x3f800000
+ .long 0xbe1edeb5
+ .long 0x3f08f59b
+ .long 0xb2be4b4e
+ .long 0x3f800000
+ .long 0xbe2c933b
+ .long 0x3f0e39da
+ .long 0xb24a32e7
+ .long 0x3f800000
+ .long 0xbe3acb0c
+ .long 0x3f13682a
+ .long 0x32cdd12e
+ .long 0x3f800000
+ .long 0xbe4983f7
+ .long 0x3f187fc0
+ .long 0xb1c7a3f3
+ .long 0x3f800000
+ .long 0xbe58bbb7
+ .long 0x3f1d7fd1
+ .long 0x3292050c
+ .long 0x3f800000
+ .long 0xbe686ff3
+ .long 0x3f226799
+ .long 0x322123bb
+ .long 0x3f800000
+ .long 0xbe789e3f
+ .long 0x3f273656
+ .long 0xb2038343
+ .long 0x3f800000
+ .long 0xbe84a20e
+ .long 0x3f2beb4a
+ .long 0xb2b73136
+ .long 0x3f800000
+ .long 0xbe8d2f7d
+ .long 0x3f3085bb
+ .long 0xb2ae2d32
+ .long 0x3f800000
+ .long 0xbe95f61a
+ .long 0x3f3504f3
+ .long 0x324fe77a
+ .long 0x3f800000
+ .long 0x3e4216eb
+ .long 0x3f396842
+ .long 0xb2810007
+ .long 0x3f000000
+ .long 0x3e2fad27
+ .long 0x3f3daef9
+ .long 0x319aabec
+ .long 0x3f000000
+ .long 0x3e1cd957
+ .long 0x3f41d870
+ .long 0x32bff977
+ .long 0x3f000000
+ .long 0x3e099e65
+ .long 0x3f45e403
+ .long 0x32b15174
+ .long 0x3f000000
+ .long 0x3debfe8a
+ .long 0x3f49d112
+ .long 0x32992640
+ .long 0x3f000000
+ .long 0x3dc3fdff
+ .long 0x3f4d9f02
+ .long 0x327e70e8
+ .long 0x3f000000
+ .long 0x3d9b4153
+ .long 0x3f514d3d
+ .long 0x300c4f04
+ .long 0x3f000000
+ .long 0x3d639d9d
+ .long 0x3f54db31
+ .long 0x3290ea1a
+ .long 0x3f000000
+ .long 0x3d0f59aa
+ .long 0x3f584853
+ .long 0xb27d5fc0
+ .long 0x3f000000
+ .long 0x3c670f32
+ .long 0x3f5b941a
+ .long 0x32232dc8
+ .long 0x3f000000
+ .long 0xbbe8b648
+ .long 0x3f5ebe05
+ .long 0x32c6f953
+ .long 0x3f000000
+ .long 0xbcea5164
+ .long 0x3f61c598
+ .long 0xb2e7f425
+ .long 0x3f000000
+ .long 0xbd4e645a
+ .long 0x3f64aa59
+ .long 0x311a08fa
+ .long 0x3f000000
+ .long 0xbd945dff
+ .long 0x3f676bd8
+ .long 0xb2bc3389
+ .long 0x3f000000
+ .long 0xbdc210d8
+ .long 0x3f6a09a7
+ .long 0xb2eb236c
+ .long 0x3f000000
+ .long 0xbdf043ab
+ .long 0x3f6c835e
+ .long 0x32f328d4
+ .long 0x3f000000
+ .long 0xbe0f77ad
+ .long 0x3f6ed89e
+ .long 0xb29333dc
+ .long 0x3f000000
+ .long 0x3db1f34f
+ .long 0x3f710908
+ .long 0x321ed0dd
+ .long 0x3e800000
+ .long 0x3d826b93
+ .long 0x3f731447
+ .long 0x32c48e11
+ .long 0x3e800000
+ .long 0x3d25018c
+ .long 0x3f74fa0b
+ .long 0xb2939d22
+ .long 0x3e800000
+ .long 0x3c88e931
+ .long 0x3f76ba07
+ .long 0x326d092c
+ .long 0x3e800000
+ .long 0xbbe60685
+ .long 0x3f7853f8
+ .long 0xb20db9e5
+ .long 0x3e800000
+ .long 0xbcfd1f65
+ .long 0x3f79c79d
+ .long 0x32c64e59
+ .long 0x3e800000
+ .long 0xbd60e8f8
+ .long 0x3f7b14be
+ .long 0x32ff75cb
+ .long 0x3e800000
+ .long 0x3d3c4289
+ .long 0x3f7c3b28
+ .long 0xb231d68b
+ .long 0x3e000000
+ .long 0x3cb2041c
+ .long 0x3f7d3aac
+ .long 0xb0f75ae9
+ .long 0x3e000000
+ .long 0xbb29b1a9
+ .long 0x3f7e1324
+ .long 0xb2f1e603
+ .long 0x3e000000
+ .long 0xbcdd0b28
+ .long 0x3f7ec46d
+ .long 0x31f44949
+ .long 0x3e000000
+ .long 0x3c354825
+ .long 0x3f7f4e6d
+ .long 0x32d01884
+ .long 0x3d800000
+ .long 0xbc5c1342
+ .long 0x3f7fb10f
+ .long 0x31de5b5f
+ .long 0x3d800000
+ .long 0xbbdbd541
+ .long 0x3f7fec43
+ .long 0x3084cd0d
+ .long 0x3d000000
+ .long 0x00000000
+ .long 0x3f800000
+ .long 0x00000000
+ .long 0x00000000
+ .long 0x3bdbd541
+ .long 0x3f7fec43
+ .long 0x3084cd0d
+ .long 0xbd000000
+ .long 0x3c5c1342
+ .long 0x3f7fb10f
+ .long 0x31de5b5f
+ .long 0xbd800000
+ .long 0xbc354825
+ .long 0x3f7f4e6d
+ .long 0x32d01884
+ .long 0xbd800000
+ .long 0x3cdd0b28
+ .long 0x3f7ec46d
+ .long 0x31f44949
+ .long 0xbe000000
+ .long 0x3b29b1a9
+ .long 0x3f7e1324
+ .long 0xb2f1e603
+ .long 0xbe000000
+ .long 0xbcb2041c
+ .long 0x3f7d3aac
+ .long 0xb0f75ae9
+ .long 0xbe000000
+ .long 0xbd3c4289
+ .long 0x3f7c3b28
+ .long 0xb231d68b
+ .long 0xbe000000
+ .long 0x3d60e8f8
+ .long 0x3f7b14be
+ .long 0x32ff75cb
+ .long 0xbe800000
+ .long 0x3cfd1f65
+ .long 0x3f79c79d
+ .long 0x32c64e59
+ .long 0xbe800000
+ .long 0x3be60685
+ .long 0x3f7853f8
+ .long 0xb20db9e5
+ .long 0xbe800000
+ .long 0xbc88e931
+ .long 0x3f76ba07
+ .long 0x326d092c
+ .long 0xbe800000
+ .long 0xbd25018c
+ .long 0x3f74fa0b
+ .long 0xb2939d22
+ .long 0xbe800000
+ .long 0xbd826b93
+ .long 0x3f731447
+ .long 0x32c48e11
+ .long 0xbe800000
+ .long 0xbdb1f34f
+ .long 0x3f710908
+ .long 0x321ed0dd
+ .long 0xbe800000
+ .long 0x3e0f77ad
+ .long 0x3f6ed89e
+ .long 0xb29333dc
+ .long 0xbf000000
+ .long 0x3df043ab
+ .long 0x3f6c835e
+ .long 0x32f328d4
+ .long 0xbf000000
+ .long 0x3dc210d8
+ .long 0x3f6a09a7
+ .long 0xb2eb236c
+ .long 0xbf000000
+ .long 0x3d945dff
+ .long 0x3f676bd8
+ .long 0xb2bc3389
+ .long 0xbf000000
+ .long 0x3d4e645a
+ .long 0x3f64aa59
+ .long 0x311a08fa
+ .long 0xbf000000
+ .long 0x3cea5164
+ .long 0x3f61c598
+ .long 0xb2e7f425
+ .long 0xbf000000
+ .long 0x3be8b648
+ .long 0x3f5ebe05
+ .long 0x32c6f953
+ .long 0xbf000000
+ .long 0xbc670f32
+ .long 0x3f5b941a
+ .long 0x32232dc8
+ .long 0xbf000000
+ .long 0xbd0f59aa
+ .long 0x3f584853
+ .long 0xb27d5fc0
+ .long 0xbf000000
+ .long 0xbd639d9d
+ .long 0x3f54db31
+ .long 0x3290ea1a
+ .long 0xbf000000
+ .long 0xbd9b4153
+ .long 0x3f514d3d
+ .long 0x300c4f04
+ .long 0xbf000000
+ .long 0xbdc3fdff
+ .long 0x3f4d9f02
+ .long 0x327e70e8
+ .long 0xbf000000
+ .long 0xbdebfe8a
+ .long 0x3f49d112
+ .long 0x32992640
+ .long 0xbf000000
+ .long 0xbe099e65
+ .long 0x3f45e403
+ .long 0x32b15174
+ .long 0xbf000000
+ .long 0xbe1cd957
+ .long 0x3f41d870
+ .long 0x32bff977
+ .long 0xbf000000
+ .long 0xbe2fad27
+ .long 0x3f3daef9
+ .long 0x319aabec
+ .long 0xbf000000
+ .long 0xbe4216eb
+ .long 0x3f396842
+ .long 0xb2810007
+ .long 0xbf000000
+ .long 0x3e95f61a
+ .long 0x3f3504f3
+ .long 0x324fe77a
+ .long 0xbf800000
+ .long 0x3e8d2f7d
+ .long 0x3f3085bb
+ .long 0xb2ae2d32
+ .long 0xbf800000
+ .long 0x3e84a20e
+ .long 0x3f2beb4a
+ .long 0xb2b73136
+ .long 0xbf800000
+ .long 0x3e789e3f
+ .long 0x3f273656
+ .long 0xb2038343
+ .long 0xbf800000
+ .long 0x3e686ff3
+ .long 0x3f226799
+ .long 0x322123bb
+ .long 0xbf800000
+ .long 0x3e58bbb7
+ .long 0x3f1d7fd1
+ .long 0x3292050c
+ .long 0xbf800000
+ .long 0x3e4983f7
+ .long 0x3f187fc0
+ .long 0xb1c7a3f3
+ .long 0xbf800000
+ .long 0x3e3acb0c
+ .long 0x3f13682a
+ .long 0x32cdd12e
+ .long 0xbf800000
+ .long 0x3e2c933b
+ .long 0x3f0e39da
+ .long 0xb24a32e7
+ .long 0xbf800000
+ .long 0x3e1edeb5
+ .long 0x3f08f59b
+ .long 0xb2be4b4e
+ .long 0xbf800000
+ .long 0x3e11af97
+ .long 0x3f039c3d
+ .long 0xb25ba002
+ .long 0xbf800000
+ .long 0x3e0507ea
+ .long 0x3efc5d27
+ .long 0xb180eca9
+ .long 0xbf800000
+ .long 0x3df1d344
+ .long 0x3ef15aea
+ .long 0xb1ff2139
+ .long 0xbf800000
+ .long 0x3ddaad38
+ .long 0x3ee63375
+ .long 0xb1d9c774
+ .long 0xbf800000
+ .long 0x3dc4a143
+ .long 0x3edae880
+ .long 0x321e15cc
+ .long 0xbf800000
+ .long 0x3dafb2cc
+ .long 0x3ecf7bca
+ .long 0x316a3b63
+ .long 0xbf800000
+ .long 0x3d9be50c
+ .long 0x3ec3ef15
+ .long 0x31d5d52c
+ .long 0xbf800000
+ .long 0x3d893b12
+ .long 0x3eb8442a
+ .long 0xb2705ba6
+ .long 0xbf800000
+ .long 0x3d6f6f7e
+ .long 0x3eac7cd4
+ .long 0xb2254e02
+ .long 0xbf800000
+ .long 0x3d4ebb8a
+ .long 0x3ea09ae5
+ .long 0xb23e89a0
+ .long 0xbf800000
+ .long 0x3d305f55
+ .long 0x3e94a031
+ .long 0x326d59f0
+ .long 0xbf800000
+ .long 0x3d145f8c
+ .long 0x3e888e93
+ .long 0x312c7d9e
+ .long 0xbf800000
+ .long 0x3cf58104
+ .long 0x3e78cfcc
+ .long 0xb11bd41d
+ .long 0xbf800000
+ .long 0x3cc70c54
+ .long 0x3e605c13
+ .long 0x31a7e4f6
+ .long 0xbf800000
+ .long 0x3c9d6830
+ .long 0x3e47c5c2
+ .long 0xb0e5967d
+ .long 0xbf800000
+ .long 0x3c71360b
+ .long 0x3e2f10a2
+ .long 0x311167f9
+ .long 0xbf800000
+ .long 0x3c315502
+ .long 0x3e164083
+ .long 0x31e8e614
+ .long 0xbf800000
+ .long 0x3bf66e3c
+ .long 0x3dfab273
+ .long 0xb11568cf
+ .long 0xbf800000
+ .long 0x3b9dc971
+ .long 0x3dc8bd36
+ .long 0xb07592f5
+ .long 0xbf800000
+ .long 0x3b319298
+ .long 0x3d96a905
+ .long 0xb1531e61
+ .long 0xbf800000
+ .long 0x3a9de1c8
+ .long 0x3d48fb30
+ .long 0xb0ef227f
+ .long 0xbf800000
+ .long 0x399de7df
+ .long 0x3cc90ab0
+ .long 0xb005c998
+ .long 0xbf800000
+ .long 0x00000000
+ .long 0x00000000
+ .long 0x00000000
+ .long 0xbf800000
+ .long 0x399de7df
+ .long 0xbcc90ab0
+ .long 0x3005c998
+ .long 0xbf800000
+ .long 0x3a9de1c8
+ .long 0xbd48fb30
+ .long 0x30ef227f
+ .long 0xbf800000
+ .long 0x3b319298
+ .long 0xbd96a905
+ .long 0x31531e61
+ .long 0xbf800000
+ .long 0x3b9dc971
+ .long 0xbdc8bd36
+ .long 0x307592f5
+ .long 0xbf800000
+ .long 0x3bf66e3c
+ .long 0xbdfab273
+ .long 0x311568cf
+ .long 0xbf800000
+ .long 0x3c315502
+ .long 0xbe164083
+ .long 0xb1e8e614
+ .long 0xbf800000
+ .long 0x3c71360b
+ .long 0xbe2f10a2
+ .long 0xb11167f9
+ .long 0xbf800000
+ .long 0x3c9d6830
+ .long 0xbe47c5c2
+ .long 0x30e5967d
+ .long 0xbf800000
+ .long 0x3cc70c54
+ .long 0xbe605c13
+ .long 0xb1a7e4f6
+ .long 0xbf800000
+ .long 0x3cf58104
+ .long 0xbe78cfcc
+ .long 0x311bd41d
+ .long 0xbf800000
+ .long 0x3d145f8c
+ .long 0xbe888e93
+ .long 0xb12c7d9e
+ .long 0xbf800000
+ .long 0x3d305f55
+ .long 0xbe94a031
+ .long 0xb26d59f0
+ .long 0xbf800000
+ .long 0x3d4ebb8a
+ .long 0xbea09ae5
+ .long 0x323e89a0
+ .long 0xbf800000
+ .long 0x3d6f6f7e
+ .long 0xbeac7cd4
+ .long 0x32254e02
+ .long 0xbf800000
+ .long 0x3d893b12
+ .long 0xbeb8442a
+ .long 0x32705ba6
+ .long 0xbf800000
+ .long 0x3d9be50c
+ .long 0xbec3ef15
+ .long 0xb1d5d52c
+ .long 0xbf800000
+ .long 0x3dafb2cc
+ .long 0xbecf7bca
+ .long 0xb16a3b63
+ .long 0xbf800000
+ .long 0x3dc4a143
+ .long 0xbedae880
+ .long 0xb21e15cc
+ .long 0xbf800000
+ .long 0x3ddaad38
+ .long 0xbee63375
+ .long 0x31d9c774
+ .long 0xbf800000
+ .long 0x3df1d344
+ .long 0xbef15aea
+ .long 0x31ff2139
+ .long 0xbf800000
+ .long 0x3e0507ea
+ .long 0xbefc5d27
+ .long 0x3180eca9
+ .long 0xbf800000
+ .long 0x3e11af97
+ .long 0xbf039c3d
+ .long 0x325ba002
+ .long 0xbf800000
+ .long 0x3e1edeb5
+ .long 0xbf08f59b
+ .long 0x32be4b4e
+ .long 0xbf800000
+ .long 0x3e2c933b
+ .long 0xbf0e39da
+ .long 0x324a32e7
+ .long 0xbf800000
+ .long 0x3e3acb0c
+ .long 0xbf13682a
+ .long 0xb2cdd12e
+ .long 0xbf800000
+ .long 0x3e4983f7
+ .long 0xbf187fc0
+ .long 0x31c7a3f3
+ .long 0xbf800000
+ .long 0x3e58bbb7
+ .long 0xbf1d7fd1
+ .long 0xb292050c
+ .long 0xbf800000
+ .long 0x3e686ff3
+ .long 0xbf226799
+ .long 0xb22123bb
+ .long 0xbf800000
+ .long 0x3e789e3f
+ .long 0xbf273656
+ .long 0x32038343
+ .long 0xbf800000
+ .long 0x3e84a20e
+ .long 0xbf2beb4a
+ .long 0x32b73136
+ .long 0xbf800000
+ .long 0x3e8d2f7d
+ .long 0xbf3085bb
+ .long 0x32ae2d32
+ .long 0xbf800000
+ .long 0x3e95f61a
+ .long 0xbf3504f3
+ .long 0xb24fe77a
+ .long 0xbf800000
+ .long 0xbe4216eb
+ .long 0xbf396842
+ .long 0x32810007
+ .long 0xbf000000
+ .long 0xbe2fad27
+ .long 0xbf3daef9
+ .long 0xb19aabec
+ .long 0xbf000000
+ .long 0xbe1cd957
+ .long 0xbf41d870
+ .long 0xb2bff977
+ .long 0xbf000000
+ .long 0xbe099e65
+ .long 0xbf45e403
+ .long 0xb2b15174
+ .long 0xbf000000
+ .long 0xbdebfe8a
+ .long 0xbf49d112
+ .long 0xb2992640
+ .long 0xbf000000
+ .long 0xbdc3fdff
+ .long 0xbf4d9f02
+ .long 0xb27e70e8
+ .long 0xbf000000
+ .long 0xbd9b4153
+ .long 0xbf514d3d
+ .long 0xb00c4f04
+ .long 0xbf000000
+ .long 0xbd639d9d
+ .long 0xbf54db31
+ .long 0xb290ea1a
+ .long 0xbf000000
+ .long 0xbd0f59aa
+ .long 0xbf584853
+ .long 0x327d5fc0
+ .long 0xbf000000
+ .long 0xbc670f32
+ .long 0xbf5b941a
+ .long 0xb2232dc8
+ .long 0xbf000000
+ .long 0x3be8b648
+ .long 0xbf5ebe05
+ .long 0xb2c6f953
+ .long 0xbf000000
+ .long 0x3cea5164
+ .long 0xbf61c598
+ .long 0x32e7f425
+ .long 0xbf000000
+ .long 0x3d4e645a
+ .long 0xbf64aa59
+ .long 0xb11a08fa
+ .long 0xbf000000
+ .long 0x3d945dff
+ .long 0xbf676bd8
+ .long 0x32bc3389
+ .long 0xbf000000
+ .long 0x3dc210d8
+ .long 0xbf6a09a7
+ .long 0x32eb236c
+ .long 0xbf000000
+ .long 0x3df043ab
+ .long 0xbf6c835e
+ .long 0xb2f328d4
+ .long 0xbf000000
+ .long 0x3e0f77ad
+ .long 0xbf6ed89e
+ .long 0x329333dc
+ .long 0xbf000000
+ .long 0xbdb1f34f
+ .long 0xbf710908
+ .long 0xb21ed0dd
+ .long 0xbe800000
+ .long 0xbd826b93
+ .long 0xbf731447
+ .long 0xb2c48e11
+ .long 0xbe800000
+ .long 0xbd25018c
+ .long 0xbf74fa0b
+ .long 0x32939d22
+ .long 0xbe800000
+ .long 0xbc88e931
+ .long 0xbf76ba07
+ .long 0xb26d092c
+ .long 0xbe800000
+ .long 0x3be60685
+ .long 0xbf7853f8
+ .long 0x320db9e5
+ .long 0xbe800000
+ .long 0x3cfd1f65
+ .long 0xbf79c79d
+ .long 0xb2c64e59
+ .long 0xbe800000
+ .long 0x3d60e8f8
+ .long 0xbf7b14be
+ .long 0xb2ff75cb
+ .long 0xbe800000
+ .long 0xbd3c4289
+ .long 0xbf7c3b28
+ .long 0x3231d68b
+ .long 0xbe000000
+ .long 0xbcb2041c
+ .long 0xbf7d3aac
+ .long 0x30f75ae9
+ .long 0xbe000000
+ .long 0x3b29b1a9
+ .long 0xbf7e1324
+ .long 0x32f1e603
+ .long 0xbe000000
+ .long 0x3cdd0b28
+ .long 0xbf7ec46d
+ .long 0xb1f44949
+ .long 0xbe000000
+ .long 0xbc354825
+ .long 0xbf7f4e6d
+ .long 0xb2d01884
+ .long 0xbd800000
+ .long 0x3c5c1342
+ .long 0xbf7fb10f
+ .long 0xb1de5b5f
+ .long 0xbd800000
+ .long 0x3bdbd541
+ .long 0xbf7fec43
+ .long 0xb084cd0d
+ .long 0xbd000000
+ .long 0x00000000
+ .long 0xbf800000
+ .long 0x00000000
+ .long 0x00000000
+ .long 0xbbdbd541
+ .long 0xbf7fec43
+ .long 0xb084cd0d
+ .long 0x3d000000
+ .long 0xbc5c1342
+ .long 0xbf7fb10f
+ .long 0xb1de5b5f
+ .long 0x3d800000
+ .long 0x3c354825
+ .long 0xbf7f4e6d
+ .long 0xb2d01884
+ .long 0x3d800000
+ .long 0xbcdd0b28
+ .long 0xbf7ec46d
+ .long 0xb1f44949
+ .long 0x3e000000
+ .long 0xbb29b1a9
+ .long 0xbf7e1324
+ .long 0x32f1e603
+ .long 0x3e000000
+ .long 0x3cb2041c
+ .long 0xbf7d3aac
+ .long 0x30f75ae9
+ .long 0x3e000000
+ .long 0x3d3c4289
+ .long 0xbf7c3b28
+ .long 0x3231d68b
+ .long 0x3e000000
+ .long 0xbd60e8f8
+ .long 0xbf7b14be
+ .long 0xb2ff75cb
+ .long 0x3e800000
+ .long 0xbcfd1f65
+ .long 0xbf79c79d
+ .long 0xb2c64e59
+ .long 0x3e800000
+ .long 0xbbe60685
+ .long 0xbf7853f8
+ .long 0x320db9e5
+ .long 0x3e800000
+ .long 0x3c88e931
+ .long 0xbf76ba07
+ .long 0xb26d092c
+ .long 0x3e800000
+ .long 0x3d25018c
+ .long 0xbf74fa0b
+ .long 0x32939d22
+ .long 0x3e800000
+ .long 0x3d826b93
+ .long 0xbf731447
+ .long 0xb2c48e11
+ .long 0x3e800000
+ .long 0x3db1f34f
+ .long 0xbf710908
+ .long 0xb21ed0dd
+ .long 0x3e800000
+ .long 0xbe0f77ad
+ .long 0xbf6ed89e
+ .long 0x329333dc
+ .long 0x3f000000
+ .long 0xbdf043ab
+ .long 0xbf6c835e
+ .long 0xb2f328d4
+ .long 0x3f000000
+ .long 0xbdc210d8
+ .long 0xbf6a09a7
+ .long 0x32eb236c
+ .long 0x3f000000
+ .long 0xbd945dff
+ .long 0xbf676bd8
+ .long 0x32bc3389
+ .long 0x3f000000
+ .long 0xbd4e645a
+ .long 0xbf64aa59
+ .long 0xb11a08fa
+ .long 0x3f000000
+ .long 0xbcea5164
+ .long 0xbf61c598
+ .long 0x32e7f425
+ .long 0x3f000000
+ .long 0xbbe8b648
+ .long 0xbf5ebe05
+ .long 0xb2c6f953
+ .long 0x3f000000
+ .long 0x3c670f32
+ .long 0xbf5b941a
+ .long 0xb2232dc8
+ .long 0x3f000000
+ .long 0x3d0f59aa
+ .long 0xbf584853
+ .long 0x327d5fc0
+ .long 0x3f000000
+ .long 0x3d639d9d
+ .long 0xbf54db31
+ .long 0xb290ea1a
+ .long 0x3f000000
+ .long 0x3d9b4153
+ .long 0xbf514d3d
+ .long 0xb00c4f04
+ .long 0x3f000000
+ .long 0x3dc3fdff
+ .long 0xbf4d9f02
+ .long 0xb27e70e8
+ .long 0x3f000000
+ .long 0x3debfe8a
+ .long 0xbf49d112
+ .long 0xb2992640
+ .long 0x3f000000
+ .long 0x3e099e65
+ .long 0xbf45e403
+ .long 0xb2b15174
+ .long 0x3f000000
+ .long 0x3e1cd957
+ .long 0xbf41d870
+ .long 0xb2bff977
+ .long 0x3f000000
+ .long 0x3e2fad27
+ .long 0xbf3daef9
+ .long 0xb19aabec
+ .long 0x3f000000
+ .long 0x3e4216eb
+ .long 0xbf396842
+ .long 0x32810007
+ .long 0x3f000000
+ .long 0xbe95f61a
+ .long 0xbf3504f3
+ .long 0xb24fe77a
+ .long 0x3f800000
+ .long 0xbe8d2f7d
+ .long 0xbf3085bb
+ .long 0x32ae2d32
+ .long 0x3f800000
+ .long 0xbe84a20e
+ .long 0xbf2beb4a
+ .long 0x32b73136
+ .long 0x3f800000
+ .long 0xbe789e3f
+ .long 0xbf273656
+ .long 0x32038343
+ .long 0x3f800000
+ .long 0xbe686ff3
+ .long 0xbf226799
+ .long 0xb22123bb
+ .long 0x3f800000
+ .long 0xbe58bbb7
+ .long 0xbf1d7fd1
+ .long 0xb292050c
+ .long 0x3f800000
+ .long 0xbe4983f7
+ .long 0xbf187fc0
+ .long 0x31c7a3f3
+ .long 0x3f800000
+ .long 0xbe3acb0c
+ .long 0xbf13682a
+ .long 0xb2cdd12e
+ .long 0x3f800000
+ .long 0xbe2c933b
+ .long 0xbf0e39da
+ .long 0x324a32e7
+ .long 0x3f800000
+ .long 0xbe1edeb5
+ .long 0xbf08f59b
+ .long 0x32be4b4e
+ .long 0x3f800000
+ .long 0xbe11af97
+ .long 0xbf039c3d
+ .long 0x325ba002
+ .long 0x3f800000
+ .long 0xbe0507ea
+ .long 0xbefc5d27
+ .long 0x3180eca9
+ .long 0x3f800000
+ .long 0xbdf1d344
+ .long 0xbef15aea
+ .long 0x31ff2139
+ .long 0x3f800000
+ .long 0xbddaad38
+ .long 0xbee63375
+ .long 0x31d9c774
+ .long 0x3f800000
+ .long 0xbdc4a143
+ .long 0xbedae880
+ .long 0xb21e15cc
+ .long 0x3f800000
+ .long 0xbdafb2cc
+ .long 0xbecf7bca
+ .long 0xb16a3b63
+ .long 0x3f800000
+ .long 0xbd9be50c
+ .long 0xbec3ef15
+ .long 0xb1d5d52c
+ .long 0x3f800000
+ .long 0xbd893b12
+ .long 0xbeb8442a
+ .long 0x32705ba6
+ .long 0x3f800000
+ .long 0xbd6f6f7e
+ .long 0xbeac7cd4
+ .long 0x32254e02
+ .long 0x3f800000
+ .long 0xbd4ebb8a
+ .long 0xbea09ae5
+ .long 0x323e89a0
+ .long 0x3f800000
+ .long 0xbd305f55
+ .long 0xbe94a031
+ .long 0xb26d59f0
+ .long 0x3f800000
+ .long 0xbd145f8c
+ .long 0xbe888e93
+ .long 0xb12c7d9e
+ .long 0x3f800000
+ .long 0xbcf58104
+ .long 0xbe78cfcc
+ .long 0x311bd41d
+ .long 0x3f800000
+ .long 0xbcc70c54
+ .long 0xbe605c13
+ .long 0xb1a7e4f6
+ .long 0x3f800000
+ .long 0xbc9d6830
+ .long 0xbe47c5c2
+ .long 0x30e5967d
+ .long 0x3f800000
+ .long 0xbc71360b
+ .long 0xbe2f10a2
+ .long 0xb11167f9
+ .long 0x3f800000
+ .long 0xbc315502
+ .long 0xbe164083
+ .long 0xb1e8e614
+ .long 0x3f800000
+ .long 0xbbf66e3c
+ .long 0xbdfab273
+ .long 0x311568cf
+ .long 0x3f800000
+ .long 0xbb9dc971
+ .long 0xbdc8bd36
+ .long 0x307592f5
+ .long 0x3f800000
+ .long 0xbb319298
+ .long 0xbd96a905
+ .long 0x31531e61
+ .long 0x3f800000
+ .long 0xba9de1c8
+ .long 0xbd48fb30
+ .long 0x30ef227f
+ .long 0x3f800000
+ .long 0xb99de7df
+ .long 0xbcc90ab0
+ .long 0x3005c998
+ .long 0x3f800000
+
+/* General purpose constants:
+ * absolute value mask */
+float_vector __sAbsMask 0x7fffffff
+
+/* threshold for out-of-range values */
+float_vector __sRangeReductionVal 0x461c4000
+
+/* +INF */
+float_vector __sRangeVal 0x7f800000
+
+/* High Accuracy version polynomial coefficients:
+ * S1 = -1.66666666664728165763e-01 */
+float_vector __sS1 0xbe2aaaab
+
+/* S2 = 8.33329173045453069014e-03 */
+float_vector __sS2 0x3c08885c
+
+/* C1 = -5.00000000000000000000e-01 */
+float_vector __sC1 0xbf000000
+
+/* C2 = 4.16638942914469202550e-02 */
+float_vector __sC2 0x3d2aaa7c
+
+/* Range reduction PI-based constants:
+ * PI high part */
+float_vector __sPI1 0x40490000
+
+/* PI mid part 1 */
+float_vector __sPI2 0x3a7da000
+
+/* PI mid part 2 */
+float_vector __sPI3 0x34222000
+
+/* PI low part */
+float_vector __sPI4 0x2cb4611a
+
+/* Range reduction PI-based constants if FMA available:
+ * PI high part (when FMA available) */
+float_vector __sPI1_FMA 0x40490fdb
+
+/* PI mid part (when FMA available) */
+float_vector __sPI2_FMA 0xb3bbbd2e
+
+/* PI low part (when FMA available) */
+float_vector __sPI3_FMA 0xa7772ced
+
+/* Polynomial coefficients: */
+float_vector __sA3 0xbe2aaaa6
+float_vector __sA5 0x3c08876a
+float_vector __sA7 0xb94fb7ff
+float_vector __sA9 0x362edef8
+
+/* 1/PI */
+float_vector __sInvPI 0x3ea2f983
+
+/* right-shifter constant */
+float_vector __sRShifter 0x4b400000
+ .type __svml_ssin_data,@object
+ .size __svml_ssin_data,.-__svml_ssin_data
diff --git a/sysdeps/x86_64/fpu/svml_s_sinf_data.h b/sysdeps/x86_64/fpu/svml_s_sinf_data.h
new file mode 100644
index 0000000..d910074
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_sinf_data.h
@@ -0,0 +1,54 @@
+/* Offsets for data table for vector sinf.
+ Copyright (C) 2014-2015 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifndef S_SINF_DATA_H
+#define S_SINF_DATA_H
+
+/* Offsets for data table */
+#define __dT 0
+#define __sAbsMask 4096
+#define __sRangeReductionVal 4160
+#define __sRangeVal 4224
+#define __sS1 4288
+#define __sS2 4352
+#define __sC1 4416
+#define __sC2 4480
+#define __sPI1 4544
+#define __sPI2 4608
+#define __sPI3 4672
+#define __sPI4 4736
+#define __sPI1_FMA 4800
+#define __sPI2_FMA 4864
+#define __sPI3_FMA 4928
+#define __sA3 4992
+#define __sA5 5056
+#define __sA7 5120
+#define __sA9 5184
+#define __sInvPI 5248
+#define __sRShifter 5312
+
+.macro float_vector offset value
+.if .-__svml_ssin_data != \offset
+.err
+.endif
+.rept 16
+.long \value
+.endr
+.endm
+
+#endif
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index 2bb155f..801b03c 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -23,3 +23,4 @@
#define VEC_TYPE __m512
VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVeN16v_cosf)
+VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVeN16v_sinf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16.c b/sysdeps/x86_64/fpu/test-float-vlen16.c
index a664ad9..8988cdb 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16.c
@@ -19,6 +19,7 @@
#include "test-float-vlen16.h"
#define TEST_VECTOR_cosf 1
+#define TEST_VECTOR_sinf 1
#define REQUIRE_AVX512F
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index 05d6a40..3a0fa6a 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -23,3 +23,4 @@
#define VEC_TYPE __m128
VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVbN4v_cosf)
+VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVbN4v_sinf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4.c b/sysdeps/x86_64/fpu/test-float-vlen4.c
index 8946520..3863787 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4.c
@@ -19,5 +19,6 @@
#include "test-float-vlen4.h"
#define TEST_VECTOR_cosf 1
+#define TEST_VECTOR_sinf 1
#include "libm-test.c"
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index cff9941..a85f588 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -26,3 +26,4 @@
#define VEC_TYPE __m256
VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVdN8v_cosf)
+VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVdN8v_sinf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
index f0ee6f2..db0b5e5 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
@@ -22,6 +22,7 @@
#define VEC_SUFF _vlen8_avx2
#define TEST_VECTOR_cosf 1
+#define TEST_VECTOR_sinf 1
#define REQUIRE_AVX2
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index c2305a3..fb7f696 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -23,3 +23,4 @@
#define VEC_TYPE __m256
VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVcN8v_cosf)
+VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVcN8v_sinf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8.c b/sysdeps/x86_64/fpu/test-float-vlen8.c
index b96dec6..f893c5b 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8.c
@@ -19,5 +19,6 @@
#include "test-float-vlen8.h"
#define TEST_VECTOR_cosf 1
+#define TEST_VECTOR_sinf 1
#include "libm-test.c"
-----------------------------------------------------------------------
Summary of changes:
ChangeLog | 31 +
NEWS | 2 +-
sysdeps/unix/sysv/linux/x86_64/libmvec.abilist | 4 +
sysdeps/x86/fpu/bits/math-vector.h | 2 +
sysdeps/x86_64/fpu/Makefile | 2 +
sysdeps/x86_64/fpu/Versions | 1 +
sysdeps/x86_64/fpu/libm-test-ulps | 8 +
sysdeps/x86_64/fpu/multiarch/Makefile | 3 +-
sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S | 39 +
.../fpu/multiarch/svml_s_sinf16_core_avx512.S | 479 +++++++++
sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S | 38 +
.../x86_64/fpu/multiarch/svml_s_sinf4_core_sse4.S | 224 ++++
sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S | 38 +
.../x86_64/fpu/multiarch/svml_s_sinf8_core_avx2.S | 219 ++++
sysdeps/x86_64/fpu/svml_s_sinf16_core.S | 25 +
sysdeps/x86_64/fpu/svml_s_sinf4_core.S | 30 +
sysdeps/x86_64/fpu/svml_s_sinf8_core.S | 29 +
sysdeps/x86_64/fpu/svml_s_sinf8_core_avx.S | 25 +
sysdeps/x86_64/fpu/svml_s_sinf_data.S | 1118 ++++++++++++++++++++
sysdeps/x86_64/fpu/svml_s_sinf_data.h | 54 +
sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c | 1 +
sysdeps/x86_64/fpu/test-float-vlen16.c | 1 +
sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c | 1 +
sysdeps/x86_64/fpu/test-float-vlen4.c | 1 +
.../x86_64/fpu/test-float-vlen8-avx2-wrappers.c | 1 +
sysdeps/x86_64/fpu/test-float-vlen8-avx2.c | 1 +
sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c | 1 +
sysdeps/x86_64/fpu/test-float-vlen8.c | 1 +
28 files changed, 2377 insertions(+), 2 deletions(-)
create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core.S
create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S
create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core.S
create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core_sse4.S
create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core.S
create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core_avx2.S
create mode 100644 sysdeps/x86_64/fpu/svml_s_sinf16_core.S
create mode 100644 sysdeps/x86_64/fpu/svml_s_sinf4_core.S
create mode 100644 sysdeps/x86_64/fpu/svml_s_sinf8_core.S
create mode 100644 sysdeps/x86_64/fpu/svml_s_sinf8_core_avx.S
create mode 100644 sysdeps/x86_64/fpu/svml_s_sinf_data.S
create mode 100644 sysdeps/x86_64/fpu/svml_s_sinf_data.h
hooks/post-receive
--
GNU C Library master sources