This is the mail archive of the glibc-cvs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

GNU C Library master sources branch master updated. glibc-2.21-496-ga6336cc


This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  a6336cc446a7ed682cb9dbc47cc56ebf9f9a4229 (commit)
      from  c9a8c526acd185176e486bee4624039740f8c435 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=a6336cc446a7ed682cb9dbc47cc56ebf9f9a4229

commit a6336cc446a7ed682cb9dbc47cc56ebf9f9a4229
Author: Andrew Senkevich <andrew.senkevich@intel.com>
Date:   Thu Jun 18 20:11:27 2015 +0300

    Vector sincosf for x86_64 and tests.
    
    Here is implementation of vectorized sincosf containing SSE, AVX,
    AVX2 and AVX512 versions according to Vector ABI
    <https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>.
    
        * NEWS: Mention addition of x86_64 vector sincosf.
        * math/test-float-vlen16.h: Added wrapper for sincosf tests.
        * math/test-float-vlen4.h: Likewise.
        * math/test-float-vlen8.h: Likewise.
        * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added.
        * sysdeps/x86/fpu/bits/math-vector.h: Added sincosf SIMD declaration.
        * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files.
        * sysdeps/x86_64/fpu/Versions: New versions added.
        * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
        * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines):
        Added build of SSE, AVX2 and AVX512 IFUNC versions.
        * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S
        * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S
        * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S
        * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S
        * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S
        * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S
        * sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
        * sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
        * sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
        * sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
        * sysdeps/x86_64/fpu/svml_s_sincosf_data.S: New file.
        * sysdeps/x86_64/fpu/svml_s_sincosf_data.h: New file.
        * sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: Added 3 argument wrappers.
        * sysdeps/x86_64/fpu/test-float-vlen16.c: : Vector sincosf tests.
        * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise.
        * sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise.

diff --git a/ChangeLog b/ChangeLog
index d07096d..8aeb643 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,38 @@
 2015-06-18  Andrew Senkevich  <andrew.senkevich@intel.com>
 
+	* NEWS: Mention addition of x86_64 vector sincosf.
+	* math/test-float-vlen16.h: Added wrapper for sincosf tests.
+	* math/test-float-vlen4.h: Likewise.
+	* math/test-float-vlen8.h: Likewise.
+	* sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added.
+	* sysdeps/x86/fpu/bits/math-vector.h: Added sincosf SIMD declaration.
+	* sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files.
+	* sysdeps/x86_64/fpu/Versions: New versions added.
+	* sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
+	* sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines):
+	Added build of SSE, AVX2 and AVX512 IFUNC versions.
+	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S
+	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S
+	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S
+	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S
+	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S
+	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S
+	* sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
+	* sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
+	* sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
+	* sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
+	* sysdeps/x86_64/fpu/svml_s_sincosf_data.S: New file.
+	* sysdeps/x86_64/fpu/svml_s_sincosf_data.h: New file.
+	* sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: Added 3 argument wrappers.
+	* sysdeps/x86_64/fpu/test-float-vlen16.c: : Vector sincosf tests.
+	* sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise.
+	* sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise.
+	* sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise.
+	* sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise.
+	* sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise.
+	* sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise.
+	* sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise.
+
 	* NEWS: Mention addition of x86_64 vector sincos.
 	* bits/libm-simd-decl-stubs.h: Added stubs for sincos.
 	* math/math.h (__MATHDECL_VEC): New macro.
diff --git a/NEWS b/NEWS
index fedbe24..050522f 100644
--- a/NEWS
+++ b/NEWS
@@ -55,8 +55,8 @@ Version 2.22
   condition in some applications.
 
 * Added vector math library named libmvec with the following vectorized x86_64
-  implementations: cos, cosf, sin, sinf, sincos, log, logf, exp, expf, pow,
-  powf.
+  implementations: cos, cosf, sin, sinf, sincos, sincosf, log, logf, exp, expf,
+  pow, powf.
   The library can be disabled with --disable-mathvec. Use of the functions is
   enabled with -fopenmp -ffast-math starting from -O1 for GCC version >= 4.9.0.
   The library is linked in as needed when using -lm (no need to specify -lmvec
diff --git a/math/test-float-vlen16.h b/math/test-float-vlen16.h
index 802ae7b..b1890f3 100644
--- a/math/test-float-vlen16.h
+++ b/math/test-float-vlen16.h
@@ -44,6 +44,7 @@
 
 #define WRAPPER_DECL(func) extern FLOAT func (FLOAT x);
 #define WRAPPER_DECL_ff(func) extern FLOAT func (FLOAT x, FLOAT y);
+#define WRAPPER_DECL_fFF(function) extern void function (FLOAT, FLOAT *, FLOAT *);
 
 // Wrapper from scalar to vector function with vector length 16.
 #define VECTOR_WRAPPER(scalar_func, vector_func) \
@@ -71,3 +72,19 @@ FLOAT scalar_func (FLOAT x, FLOAT y)		\
   TEST_VEC_LOOP (mr, 16);			\
   return ((FLOAT) mr[0]);			\
 }
+
+// Wrapper from scalar 3 argument function to vector one.
+#define VECTOR_WRAPPER_fFF(scalar_func, vector_func) 	\
+extern void vector_func (VEC_TYPE, VEC_TYPE *, VEC_TYPE *);	\
+void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1)		\
+{						\
+  int i;					\
+  VEC_TYPE mx, mr, mr1;				\
+  INIT_VEC_LOOP (mx, x, 16);			\
+  vector_func (mx, &mr, &mr1);			\
+  TEST_VEC_LOOP (mr, 16);			\
+  TEST_VEC_LOOP (mr1, 16);			\
+  *r = (FLOAT) mr[0];				\
+  *r1 = (FLOAT) mr1[0];				\
+  return;					\
+}
diff --git a/math/test-float-vlen4.h b/math/test-float-vlen4.h
index f5e530b..213ae78 100644
--- a/math/test-float-vlen4.h
+++ b/math/test-float-vlen4.h
@@ -44,6 +44,7 @@
 
 #define WRAPPER_DECL(function) extern FLOAT function (FLOAT);
 #define WRAPPER_DECL_ff(function) extern FLOAT function (FLOAT, FLOAT);
+#define WRAPPER_DECL_fFF(function) extern void function (FLOAT, FLOAT *, FLOAT *);
 
 // Wrapper from scalar to vector function with vector length 4.
 #define VECTOR_WRAPPER(scalar_func, vector_func) \
@@ -71,3 +72,19 @@ FLOAT scalar_func (FLOAT x, FLOAT y)		\
   TEST_VEC_LOOP (mr, 4);			\
   return ((FLOAT) mr[0]);			\
 }
+
+// Wrapper from scalar 3 argument function to vector one.
+#define VECTOR_WRAPPER_fFF(scalar_func, vector_func) 	\
+extern void vector_func (VEC_TYPE, VEC_TYPE *, VEC_TYPE *);	\
+void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1)		\
+{						\
+  int i;					\
+  VEC_TYPE mx, mr, mr1;				\
+  INIT_VEC_LOOP (mx, x, 4);			\
+  vector_func (mx, &mr, &mr1);			\
+  TEST_VEC_LOOP (mr, 4);			\
+  TEST_VEC_LOOP (mr1, 4);			\
+  *r = (FLOAT) mr[0];				\
+  *r1 = (FLOAT) mr1[0];				\
+  return;					\
+}
diff --git a/math/test-float-vlen8.h b/math/test-float-vlen8.h
index 697849f..dd2fb28 100644
--- a/math/test-float-vlen8.h
+++ b/math/test-float-vlen8.h
@@ -44,6 +44,7 @@
 
 #define WRAPPER_DECL(function) extern FLOAT function (FLOAT);
 #define WRAPPER_DECL_ff(function) extern FLOAT function (FLOAT, FLOAT);
+#define WRAPPER_DECL_fFF(function) extern void function (FLOAT, FLOAT *, FLOAT *);
 
 // Wrapper from scalar to vector function with vector length 8.
 #define VECTOR_WRAPPER(scalar_func, vector_func) \
@@ -71,3 +72,19 @@ FLOAT scalar_func (FLOAT x, FLOAT y)		\
   TEST_VEC_LOOP (mr, 8);			\
   return ((FLOAT) mr[0]);			\
 }
+
+// Wrapper from scalar 3 argument function to vector one.
+#define VECTOR_WRAPPER_fFF(scalar_func, vector_func) 	\
+extern void vector_func (VEC_TYPE, VEC_TYPE *, VEC_TYPE *);	\
+void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1)		\
+{						\
+  int i;					\
+  VEC_TYPE mx, mr, mr1;				\
+  INIT_VEC_LOOP (mx, x, 8);			\
+  vector_func (mx, &mr, &mr1);			\
+  TEST_VEC_LOOP (mr, 8);			\
+  TEST_VEC_LOOP (mr1, 8);			\
+  *r = (FLOAT) mr[0];				\
+  *r1 = (FLOAT) mr1[0];				\
+  return;					\
+}
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index 6c45844..b7efeab 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -11,6 +11,7 @@ GLIBC_2.22
  _ZGVbN4v_logf F
  _ZGVbN4v_sinf F
  _ZGVbN4vv_powf F
+ _ZGVbN4vvv_sincosf F
  _ZGVcN4v_cos F
  _ZGVcN4v_exp F
  _ZGVcN4v_log F
@@ -22,6 +23,7 @@ GLIBC_2.22
  _ZGVcN8v_logf F
  _ZGVcN8v_sinf F
  _ZGVcN8vv_powf F
+ _ZGVcN8vvv_sincosf F
  _ZGVdN4v_cos F
  _ZGVdN4v_exp F
  _ZGVdN4v_log F
@@ -33,11 +35,13 @@ GLIBC_2.22
  _ZGVdN8v_logf F
  _ZGVdN8v_sinf F
  _ZGVdN8vv_powf F
+ _ZGVdN8vvv_sincosf F
  _ZGVeN16v_cosf F
  _ZGVeN16v_expf F
  _ZGVeN16v_logf F
  _ZGVeN16v_sinf F
  _ZGVeN16vv_powf F
+ _ZGVeN16vvv_sincosf F
  _ZGVeN8v_cos F
  _ZGVeN8v_exp F
  _ZGVeN8v_log F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index f684ff5..f9e798b 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -38,6 +38,8 @@
 #  define __DECL_SIMD_sinf __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_sincos
 #  define __DECL_SIMD_sincos __DECL_SIMD_x86_64
+#  undef __DECL_SIMD_sincosf
+#  define __DECL_SIMD_sincosf __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_log
 #  define __DECL_SIMD_log __DECL_SIMD_x86_64
 #  undef __DECL_SIMD_logf
diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
index 9c28d62..c6912cb 100644
--- a/sysdeps/x86_64/fpu/Makefile
+++ b/sysdeps/x86_64/fpu/Makefile
@@ -19,7 +19,9 @@ libmvec-support += svml_d_cos2_core svml_d_cos4_core_avx \
 		   svml_d_pow4_core_avx svml_d_pow4_core svml_d_pow8_core \
 		   svml_d_pow_data svml_s_powf4_core svml_s_powf8_core_avx \
 		   svml_s_powf8_core svml_s_powf16_core svml_s_powf_data \
-		   init-arch
+		   svml_s_sincosf4_core svml_s_sincosf8_core_avx \
+		   svml_s_sincosf8_core svml_s_sincosf16_core \
+		   svml_s_sincosf_data init-arch
 endif
 
 # Variables for libmvec tests.
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index d950f58..0813204 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -11,5 +11,6 @@ libmvec {
     _ZGVbN4v_logf; _ZGVcN8v_logf; _ZGVdN8v_logf; _ZGVeN16v_logf;
     _ZGVbN4v_expf; _ZGVcN8v_expf; _ZGVdN8v_expf; _ZGVeN16v_expf;
     _ZGVbN4vv_powf; _ZGVcN8vv_powf; _ZGVdN8vv_powf; _ZGVeN16vv_powf;
+    _ZGVbN4vvv_sincosf; _ZGVcN8vvv_sincosf; _ZGVdN8vvv_sincosf; _ZGVeN16vvv_sincosf;
   }
 }
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 74b1af5..2e2722d 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -2031,17 +2031,25 @@ idouble: 1
 ildouble: 3
 ldouble: 3
 
+Function: "sincos_vlen16":
+float: 1
+
 Function: "sincos_vlen2":
 double: 1
 
 Function: "sincos_vlen4":
 double: 1
+float: 1
 
 Function: "sincos_vlen4_avx2":
 double: 1
 
 Function: "sincos_vlen8":
 double: 1
+float: 1
+
+Function: "sincos_vlen8_avx2":
+float: 1
 
 Function: "sinh":
 double: 2
diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64/fpu/multiarch/Makefile
index 9e510db..86ea473 100644
--- a/sysdeps/x86_64/fpu/multiarch/Makefile
+++ b/sysdeps/x86_64/fpu/multiarch/Makefile
@@ -69,5 +69,6 @@ libmvec-sysdep_routines += svml_d_cos2_core_sse4 svml_d_cos4_core_avx2 \
 			   svml_s_expf16_core_avx512 svml_d_pow2_core_sse4 \
 			   svml_d_pow4_core_avx2 svml_d_pow8_core_avx512 \
 			   svml_s_powf4_core_sse4 svml_s_powf8_core_avx2 \
-			   svml_s_powf16_core_avx512
+			   svml_s_powf16_core_avx512 svml_s_sincosf4_core_sse4 \
+			   svml_s_sincosf8_core_avx2 svml_s_sincosf16_core_avx512
 endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S
new file mode 100644
index 0000000..0a1753e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S
@@ -0,0 +1,39 @@
+/* Multiple versions of vectorized sincosf.
+   Copyright (C) 2014-2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <init-arch.h>
+
+	.text
+ENTRY (_ZGVeN16vvv_sincosf)
+        .type   _ZGVeN16vvv_sincosf, @gnu_indirect_function
+        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
+        jne     1
+        call    __init_cpu_features
+1:      leaq    _ZGVeN16vvv_sincosf_skx(%rip), %rax
+        testl   $bit_AVX512DQ_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512DQ_Usable(%rip)
+        jnz     3
+2:      leaq    _ZGVeN16vvv_sincosf_knl(%rip), %rax
+        testl   $bit_AVX512F_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512F_Usable(%rip)
+        jnz     3
+        leaq    _ZGVeN16vvv_sincosf_avx2_wrapper(%rip), %rax
+3:      ret
+END (_ZGVeN16vvv_sincosf)
+
+#define _ZGVeN16vvv_sincosf _ZGVeN16vvv_sincosf_avx2_wrapper
+#include "../svml_s_sincosf16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S
new file mode 100644
index 0000000..cae49f6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S
@@ -0,0 +1,504 @@
+/* Function sincosf vectorized with AVX-512. KNL and SKX versions.
+   Copyright (C) 2014-2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_sincosf_data.h"
+#include "svml_s_wrapper_impl.h"
+
+/*
+   ALGORITHM DESCRIPTION:
+
+     1) Range reduction to [-Pi/4; +Pi/4] interval
+        a) Grab sign from source argument and save it.
+        b) Remove sign using AND operation
+        c) Getting octant Y by 2/Pi multiplication
+        d) Add "Right Shifter" value
+        e) Treat obtained value as integer S for destination sign setting.
+           SS = ((S-S&1)&2)<<30; For sin part
+           SC = ((S+S&1)&2)<<30; For cos part
+        f) Change destination sign if source sign is negative
+           using XOR operation.
+        g) Subtract "Right Shifter" (0x4B000000) value
+        h) Subtract Y*(PI/2) from X argument, where PI/2 divided to 4 parts:
+           X = X - Y*PI1 - Y*PI2 - Y*PI3 - Y*PI4;
+     2) Polynomial (minimax for sin within  [-Pi/4; +Pi/4] interval)
+        a) Calculate X^2 = X * X
+        b) Calculate 2 polynomials for sin and cos:
+           RS = X * ( A0 + X^2 * (A1 + x^2 * (A2 + x^2 * (A3))));
+           RC = B0 + X^2 * (B1 + x^2 * (B2 + x^2 * (B3 + x^2 * (B4))));
+        c) Swap RS & RC if if first bit of obtained value after
+           Right Shifting is set to 1. Using And, Andnot & Or operations.
+     3) Destination sign setting
+        a) Set shifted destination sign using XOR operation:
+           R1 = XOR( RS, SS );
+           R2 = XOR( RC, SC ).  */
+
+	.text
+ENTRY (_ZGVeN16vvv_sincosf_knl)
+#ifndef HAVE_AVX512_ASM_SUPPORT
+WRAPPER_IMPL_AVX512_fFF _ZGVdN8vvv_sincosf
+#else
+        pushq     %rbp
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%rbp, 0)
+        movq      %rsp, %rbp
+        cfi_def_cfa_register (%rbp)
+        andq      $-64, %rsp
+        subq      $1344, %rsp
+        movq      __svml_ssincos_data@GOTPCREL(%rip), %rax
+        vmovaps   %zmm0, %zmm2
+        movl      $-1, %edx
+        vmovups __sAbsMask(%rax), %zmm0
+        vmovups __sInvPI(%rax), %zmm3
+
+/* Absolute argument computation */
+        vpandd    %zmm0, %zmm2, %zmm1
+        vmovups __sPI1_FMA(%rax), %zmm5
+        vmovups __sSignMask(%rax), %zmm9
+        vpandnd   %zmm2, %zmm0, %zmm0
+
+/* h) Subtract Y*(PI/2) from X argument, where PI/2 divided to 3 parts:
+      X = X - Y*PI1 - Y*PI2 - Y*PI3 */
+        vmovaps   %zmm1, %zmm6
+        vmovaps   %zmm1, %zmm8
+
+/* c) Getting octant Y by 2/Pi multiplication
+   d) Add "Right Shifter" value */
+        vfmadd213ps __sRShifter(%rax), %zmm1, %zmm3
+        vmovups __sPI3_FMA(%rax), %zmm7
+
+/* g) Subtract "Right Shifter" (0x4B000000) value */
+        vsubps __sRShifter(%rax), %zmm3, %zmm12
+
+/* e) Treat obtained value as integer S for destination sign setting */
+        vpslld    $31, %zmm3, %zmm13
+        vmovups __sA7_FMA(%rax), %zmm14
+        vfnmadd231ps %zmm12, %zmm5, %zmm6
+
+/* 2) Polynomial (minimax for sin within  [-Pi/4; +Pi/4] interval)
+      a) Calculate X^2 = X * X
+      b) Calculate 2 polynomials for sin and cos:
+         RS = X * ( A0 + X^2 * (A1 + x^2 * (A2 + x^2 * (A3))));
+         RC = B0 + X^2 * (B1 + x^2 * (B2 + x^2 * (B3 + x^2 * (B4)))) */
+        vmovaps   %zmm14, %zmm15
+        vmovups __sA9_FMA(%rax), %zmm3
+        vcmpps    $22, __sRangeReductionVal(%rax), %zmm1, %k1
+        vpbroadcastd %edx, %zmm1{%k1}{z}
+        vfnmadd231ps __sPI2_FMA(%rax), %zmm12, %zmm6
+        vptestmd  %zmm1, %zmm1, %k0
+        vpandd    %zmm6, %zmm9, %zmm11
+        kmovw     %k0, %ecx
+        vpxord __sOneHalf(%rax), %zmm11, %zmm4
+
+/* Result sign calculations */
+        vpternlogd $150, %zmm13, %zmm9, %zmm11
+
+/* Add correction term 0.5 for cos() part */
+        vaddps    %zmm4, %zmm12, %zmm10
+        vfnmadd213ps %zmm6, %zmm7, %zmm12
+        vfnmadd231ps %zmm10, %zmm5, %zmm8
+        vpxord    %zmm13, %zmm12, %zmm13
+        vmulps    %zmm13, %zmm13, %zmm12
+        vfnmadd231ps __sPI2_FMA(%rax), %zmm10, %zmm8
+        vfmadd231ps __sA9_FMA(%rax), %zmm12, %zmm15
+        vfnmadd213ps %zmm8, %zmm7, %zmm10
+        vfmadd213ps __sA5_FMA(%rax), %zmm12, %zmm15
+        vpxord    %zmm11, %zmm10, %zmm5
+        vmulps    %zmm5, %zmm5, %zmm4
+        vfmadd213ps __sA3(%rax), %zmm12, %zmm15
+        vfmadd213ps %zmm14, %zmm4, %zmm3
+        vmulps    %zmm12, %zmm15, %zmm14
+        vfmadd213ps __sA5_FMA(%rax), %zmm4, %zmm3
+        vfmadd213ps %zmm13, %zmm13, %zmm14
+        vfmadd213ps __sA3(%rax), %zmm4, %zmm3
+        vpxord    %zmm0, %zmm14, %zmm0
+        vmulps    %zmm4, %zmm3, %zmm3
+        vfmadd213ps %zmm5, %zmm5, %zmm3
+        testl     %ecx, %ecx
+        jne       .LBL_1_3
+
+.LBL_1_2:
+        cfi_remember_state
+        vmovups   %zmm0, (%rdi)
+        vmovups   %zmm3, (%rsi)
+        movq      %rbp, %rsp
+        cfi_def_cfa_register (%rsp)
+        popq      %rbp
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%rbp)
+        ret
+
+.LBL_1_3:
+        cfi_restore_state
+        vmovups   %zmm2, 1152(%rsp)
+        vmovups   %zmm0, 1216(%rsp)
+        vmovups   %zmm3, 1280(%rsp)
+        je        .LBL_1_2
+
+        xorb      %dl, %dl
+        kmovw     %k4, 1048(%rsp)
+        xorl      %eax, %eax
+        kmovw     %k5, 1040(%rsp)
+        kmovw     %k6, 1032(%rsp)
+        kmovw     %k7, 1024(%rsp)
+        vmovups   %zmm16, 960(%rsp)
+        vmovups   %zmm17, 896(%rsp)
+        vmovups   %zmm18, 832(%rsp)
+        vmovups   %zmm19, 768(%rsp)
+        vmovups   %zmm20, 704(%rsp)
+        vmovups   %zmm21, 640(%rsp)
+        vmovups   %zmm22, 576(%rsp)
+        vmovups   %zmm23, 512(%rsp)
+        vmovups   %zmm24, 448(%rsp)
+        vmovups   %zmm25, 384(%rsp)
+        vmovups   %zmm26, 320(%rsp)
+        vmovups   %zmm27, 256(%rsp)
+        vmovups   %zmm28, 192(%rsp)
+        vmovups   %zmm29, 128(%rsp)
+        vmovups   %zmm30, 64(%rsp)
+        vmovups   %zmm31, (%rsp)
+        movq      %rsi, 1056(%rsp)
+        movq      %r12, 1096(%rsp)
+        cfi_offset_rel_rsp (12, 1096)
+        movb      %dl, %r12b
+        movq      %r13, 1088(%rsp)
+        cfi_offset_rel_rsp (13, 1088)
+        movl      %eax, %r13d
+        movq      %r14, 1080(%rsp)
+        cfi_offset_rel_rsp (14, 1080)
+        movl      %ecx, %r14d
+        movq      %r15, 1072(%rsp)
+        cfi_offset_rel_rsp (15, 1072)
+        movq      %rbx, 1064(%rsp)
+        movq      %rdi, %rbx
+        cfi_remember_state
+
+.LBL_1_6:
+        btl       %r13d, %r14d
+        jc        .LBL_1_13
+
+.LBL_1_7:
+        lea       1(%r13), %esi
+        btl       %esi, %r14d
+        jc        .LBL_1_10
+
+.LBL_1_8:
+        addb      $1, %r12b
+        addl      $2, %r13d
+        cmpb      $16, %r12b
+        jb        .LBL_1_6
+
+        movq      %rbx, %rdi
+        kmovw     1048(%rsp), %k4
+        movq      1056(%rsp), %rsi
+        kmovw     1040(%rsp), %k5
+        movq      1096(%rsp), %r12
+        cfi_restore (%r12)
+        kmovw     1032(%rsp), %k6
+        movq      1088(%rsp), %r13
+        cfi_restore (%r13)
+        kmovw     1024(%rsp), %k7
+        vmovups   960(%rsp), %zmm16
+        vmovups   896(%rsp), %zmm17
+        vmovups   832(%rsp), %zmm18
+        vmovups   768(%rsp), %zmm19
+        vmovups   704(%rsp), %zmm20
+        vmovups   640(%rsp), %zmm21
+        vmovups   576(%rsp), %zmm22
+        vmovups   512(%rsp), %zmm23
+        vmovups   448(%rsp), %zmm24
+        vmovups   384(%rsp), %zmm25
+        vmovups   320(%rsp), %zmm26
+        vmovups   256(%rsp), %zmm27
+        vmovups   192(%rsp), %zmm28
+        vmovups   128(%rsp), %zmm29
+        vmovups   64(%rsp), %zmm30
+        vmovups   (%rsp), %zmm31
+        movq      1080(%rsp), %r14
+        cfi_restore (%r14)
+        movq      1072(%rsp), %r15
+        cfi_restore (%r15)
+        movq      1064(%rsp), %rbx
+        vmovups   1216(%rsp), %zmm0
+        vmovups   1280(%rsp), %zmm3
+        jmp       .LBL_1_2
+
+.LBL_1_10:
+        cfi_restore_state
+        movzbl    %r12b, %r15d
+        vmovss    1156(%rsp,%r15,8), %xmm0
+
+        call      sinf@PLT
+
+        vmovss    %xmm0, 1220(%rsp,%r15,8)
+        vmovss    1156(%rsp,%r15,8), %xmm0
+
+        call      cosf@PLT
+
+        vmovss    %xmm0, 1284(%rsp,%r15,8)
+        jmp       .LBL_1_8
+
+.LBL_1_13:
+        movzbl    %r12b, %r15d
+        vmovss    1152(%rsp,%r15,8), %xmm0
+
+        call      sinf@PLT
+
+        vmovss    %xmm0, 1216(%rsp,%r15,8)
+        vmovss    1152(%rsp,%r15,8), %xmm0
+
+        call      cosf@PLT
+
+        vmovss    %xmm0, 1280(%rsp,%r15,8)
+        jmp       .LBL_1_7
+#endif
+END (_ZGVeN16vvv_sincosf_knl)
+
+ENTRY (_ZGVeN16vvv_sincosf_skx)
+#ifndef HAVE_AVX512_ASM_SUPPORT
+WRAPPER_IMPL_AVX512_fFF _ZGVdN8vvv_sincosf
+#else
+        pushq     %rbp
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%rbp, 0)
+        movq      %rsp, %rbp
+        cfi_def_cfa_register (%rbp)
+        andq      $-64, %rsp
+        subq      $1344, %rsp
+        movq      __svml_ssincos_data@GOTPCREL(%rip), %rax
+        vmovaps   %zmm0, %zmm4
+        vmovups __sAbsMask(%rax), %zmm3
+        vmovups __sInvPI(%rax), %zmm5
+        vmovups __sRShifter(%rax), %zmm6
+        vmovups __sPI1_FMA(%rax), %zmm9
+        vmovups __sPI2_FMA(%rax), %zmm10
+        vmovups __sSignMask(%rax), %zmm14
+        vmovups __sOneHalf(%rax), %zmm7
+        vmovups __sPI3_FMA(%rax), %zmm12
+
+/* Absolute argument computation */
+        vandps    %zmm3, %zmm4, %zmm2
+
+/* c) Getting octant Y by 2/Pi multiplication
+   d) Add "Right Shifter" value */
+        vfmadd213ps %zmm6, %zmm2, %zmm5
+        vcmpps    $18, __sRangeReductionVal(%rax), %zmm2, %k1
+
+/* e) Treat obtained value as integer S for destination sign setting */
+        vpslld    $31, %zmm5, %zmm0
+
+/* g) Subtract "Right Shifter" (0x4B000000) value */
+        vsubps    %zmm6, %zmm5, %zmm5
+        vmovups __sA3(%rax), %zmm6
+
+/* h) Subtract Y*(PI/2) from X argument, where PI/2 divided to 3 parts:
+      X = X - Y*PI1 - Y*PI2 - Y*PI3 */
+        vmovaps   %zmm2, %zmm11
+        vfnmadd231ps %zmm5, %zmm9, %zmm11
+        vfnmadd231ps %zmm5, %zmm10, %zmm11
+        vandps    %zmm11, %zmm14, %zmm1
+        vxorps    %zmm1, %zmm7, %zmm8
+
+/* Result sign calculations */
+        vpternlogd $150, %zmm0, %zmm14, %zmm1
+        vmovups   .L_2il0floatpacket.13(%rip), %zmm14
+
+/* Add correction term 0.5 for cos() part */
+        vaddps    %zmm8, %zmm5, %zmm15
+        vfnmadd213ps %zmm11, %zmm12, %zmm5
+        vandnps   %zmm4, %zmm3, %zmm11
+        vmovups __sA7_FMA(%rax), %zmm3
+        vmovaps   %zmm2, %zmm13
+        vfnmadd231ps %zmm15, %zmm9, %zmm13
+        vxorps    %zmm0, %zmm5, %zmm9
+        vmovups __sA5_FMA(%rax), %zmm0
+        vfnmadd231ps %zmm15, %zmm10, %zmm13
+        vmulps    %zmm9, %zmm9, %zmm8
+        vfnmadd213ps %zmm13, %zmm12, %zmm15
+        vmovups __sA9_FMA(%rax), %zmm12
+        vxorps    %zmm1, %zmm15, %zmm1
+        vmulps    %zmm1, %zmm1, %zmm13
+
+/* 2) Polynomial (minimax for sin within  [-Pi/4; +Pi/4] interval)
+      a) Calculate X^2 = X * X
+      b) Calculate 2 polynomials for sin and cos:
+         RS = X * ( A0 + X^2 * (A1 + x^2 * (A2 + x^2 * (A3))));
+         RC = B0 + X^2 * (B1 + x^2 * (B2 + x^2 * (B3 + x^2 * (B4)))) */
+        vmovaps   %zmm12, %zmm7
+        vfmadd213ps %zmm3, %zmm8, %zmm7
+        vfmadd213ps %zmm3, %zmm13, %zmm12
+        vfmadd213ps %zmm0, %zmm8, %zmm7
+        vfmadd213ps %zmm0, %zmm13, %zmm12
+        vfmadd213ps %zmm6, %zmm8, %zmm7
+        vfmadd213ps %zmm6, %zmm13, %zmm12
+        vmulps    %zmm8, %zmm7, %zmm10
+        vmulps    %zmm13, %zmm12, %zmm3
+        vfmadd213ps %zmm9, %zmm9, %zmm10
+        vfmadd213ps %zmm1, %zmm1, %zmm3
+        vxorps    %zmm11, %zmm10, %zmm0
+        vpandnd   %zmm2, %zmm2, %zmm14{%k1}
+        vptestmd  %zmm14, %zmm14, %k0
+        kmovw     %k0, %ecx
+        testl     %ecx, %ecx
+        jne       .LBL_2_3
+
+.LBL_2_2:
+        cfi_remember_state
+        vmovups   %zmm0, (%rdi)
+        vmovups   %zmm3, (%rsi)
+        movq      %rbp, %rsp
+        cfi_def_cfa_register (%rsp)
+        popq      %rbp
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%rbp)
+        ret
+
+.LBL_2_3:
+        cfi_restore_state
+        vmovups   %zmm4, 1152(%rsp)
+        vmovups   %zmm0, 1216(%rsp)
+        vmovups   %zmm3, 1280(%rsp)
+        je        .LBL_2_2
+
+        xorb      %dl, %dl
+        xorl      %eax, %eax
+        kmovw     %k4, 1048(%rsp)
+        kmovw     %k5, 1040(%rsp)
+        kmovw     %k6, 1032(%rsp)
+        kmovw     %k7, 1024(%rsp)
+        vmovups   %zmm16, 960(%rsp)
+        vmovups   %zmm17, 896(%rsp)
+        vmovups   %zmm18, 832(%rsp)
+        vmovups   %zmm19, 768(%rsp)
+        vmovups   %zmm20, 704(%rsp)
+        vmovups   %zmm21, 640(%rsp)
+        vmovups   %zmm22, 576(%rsp)
+        vmovups   %zmm23, 512(%rsp)
+        vmovups   %zmm24, 448(%rsp)
+        vmovups   %zmm25, 384(%rsp)
+        vmovups   %zmm26, 320(%rsp)
+        vmovups   %zmm27, 256(%rsp)
+        vmovups   %zmm28, 192(%rsp)
+        vmovups   %zmm29, 128(%rsp)
+        vmovups   %zmm30, 64(%rsp)
+        vmovups   %zmm31, (%rsp)
+        movq      %rsi, 1056(%rsp)
+        movq      %r12, 1096(%rsp)
+        cfi_offset_rel_rsp (12, 1096)
+        movb      %dl, %r12b
+        movq      %r13, 1088(%rsp)
+        cfi_offset_rel_rsp (13, 1088)
+        movl      %eax, %r13d
+        movq      %r14, 1080(%rsp)
+        cfi_offset_rel_rsp (14, 1080)
+        movl      %ecx, %r14d
+        movq      %r15, 1072(%rsp)
+        cfi_offset_rel_rsp (15, 1072)
+        movq      %rbx, 1064(%rsp)
+        movq      %rdi, %rbx
+        cfi_remember_state
+
+.LBL_2_6:
+        btl       %r13d, %r14d
+        jc        .LBL_2_13
+
+.LBL_2_7:
+        lea       1(%r13), %esi
+        btl       %esi, %r14d
+        jc        .LBL_2_10
+
+.LBL_2_8:
+        incb      %r12b
+        addl      $2, %r13d
+        cmpb      $16, %r12b
+        jb        .LBL_2_6
+
+        kmovw     1048(%rsp), %k4
+        movq      %rbx, %rdi
+        kmovw     1040(%rsp), %k5
+        kmovw     1032(%rsp), %k6
+        kmovw     1024(%rsp), %k7
+        vmovups   960(%rsp), %zmm16
+        vmovups   896(%rsp), %zmm17
+        vmovups   832(%rsp), %zmm18
+        vmovups   768(%rsp), %zmm19
+        vmovups   704(%rsp), %zmm20
+        vmovups   640(%rsp), %zmm21
+        vmovups   576(%rsp), %zmm22
+        vmovups   512(%rsp), %zmm23
+        vmovups   448(%rsp), %zmm24
+        vmovups   384(%rsp), %zmm25
+        vmovups   320(%rsp), %zmm26
+        vmovups   256(%rsp), %zmm27
+        vmovups   192(%rsp), %zmm28
+        vmovups   128(%rsp), %zmm29
+        vmovups   64(%rsp), %zmm30
+        vmovups   (%rsp), %zmm31
+        vmovups   1216(%rsp), %zmm0
+        vmovups   1280(%rsp), %zmm3
+        movq      1056(%rsp), %rsi
+        movq      1096(%rsp), %r12
+        cfi_restore (%r12)
+        movq      1088(%rsp), %r13
+        cfi_restore (%r13)
+        movq      1080(%rsp), %r14
+        cfi_restore (%r14)
+        movq      1072(%rsp), %r15
+        cfi_restore (%r15)
+        movq      1064(%rsp), %rbx
+        jmp       .LBL_2_2
+
+.LBL_2_10:
+        cfi_restore_state
+        movzbl    %r12b, %r15d
+        vmovss    1156(%rsp,%r15,8), %xmm0
+        vzeroupper
+        vmovss    1156(%rsp,%r15,8), %xmm0
+
+        call      sinf@PLT
+
+        vmovss    %xmm0, 1220(%rsp,%r15,8)
+        vmovss    1156(%rsp,%r15,8), %xmm0
+
+        call      cosf@PLT
+
+        vmovss    %xmm0, 1284(%rsp,%r15,8)
+        jmp       .LBL_2_8
+
+.LBL_2_13:
+        movzbl    %r12b, %r15d
+        vmovss    1152(%rsp,%r15,8), %xmm0
+        vzeroupper
+        vmovss    1152(%rsp,%r15,8), %xmm0
+
+        call      sinf@PLT
+
+        vmovss    %xmm0, 1216(%rsp,%r15,8)
+        vmovss    1152(%rsp,%r15,8), %xmm0
+
+        call      cosf@PLT
+
+        vmovss    %xmm0, 1280(%rsp,%r15,8)
+        jmp       .LBL_2_7
+#endif
+END (_ZGVeN16vvv_sincosf_skx)
+
+	.section .rodata, "a"
+.L_2il0floatpacket.13:
+	.long	0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff
+	.type	.L_2il0floatpacket.13,@object
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4.c b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S
similarity index 54%
copy from sysdeps/x86_64/fpu/test-float-vlen4.c
copy to sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S
index 3e74118..610046b 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4.c
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S
@@ -1,4 +1,4 @@
-/* Tests for SSE ISA versions of vector math functions.
+/* Multiple versions of vectorized sincosf.
    Copyright (C) 2014-2015 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,12 +16,23 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include "test-float-vlen4.h"
+#include <sysdep.h>
+#include <init-arch.h>
 
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
-#define TEST_VECTOR_expf 1
-#define TEST_VECTOR_powf 1
+	.text
+ENTRY (_ZGVbN4vvv_sincosf)
+        .type   _ZGVbN4vvv_sincosf, @gnu_indirect_function
+        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
+        jne     1f
+        call    __init_cpu_features
+1:      leaq    _ZGVbN4vvv_sincosf_sse4(%rip), %rax
+        testl   $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+        jz      2f
+        ret
+2:      leaq    _ZGVbN4vvv_sincosf_sse2(%rip), %rax
+        ret
+END (_ZGVbN4vvv_sincosf)
+libmvec_hidden_def (_ZGVbN4vvv_sincosf)
 
-#include "libm-test.c"
+#define _ZGVbN4vvv_sincosf _ZGVbN4vvv_sincosf_sse2
+#include "../svml_s_sincosf4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S
new file mode 100644
index 0000000..8c51e44
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S
@@ -0,0 +1,268 @@
+/* Function sincosf vectorized with SSE4.
+   Copyright (C) 2014-2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_sincosf_data.h"
+
+	.text
+ENTRY (_ZGVbN4vvv_sincosf_sse4)
+/*
+   ALGORITHM DESCRIPTION:
+
+     1) Range reduction to [-Pi/4; +Pi/4] interval
+        a) Grab sign from source argument and save it.
+        b) Remove sign using AND operation
+        c) Getting octant Y by 2/Pi multiplication
+        d) Add "Right Shifter" value
+        e) Treat obtained value as integer S for destination sign setting.
+           SS = ((S-S&1)&2)<<30; For sin part
+           SC = ((S+S&1)&2)<<30; For cos part
+        f) Change destination sign if source sign is negative
+           using XOR operation.
+        g) Subtract "Right Shifter" (0x4B000000) value
+        h) Subtract Y*(PI/2) from X argument, where PI/2 divided to 4 parts:
+           X = X - Y*PI1 - Y*PI2 - Y*PI3 - Y*PI4;
+     2) Polynomial (minimax for sin within  [-Pi/4; +Pi/4] interval)
+        a) Calculate X^2 = X * X
+        b) Calculate 2 polynomials for sin and cos:
+           RS = X * ( A0 + X^2 * (A1 + x^2 * (A2 + x^2 * (A3))));
+           RC = B0 + X^2 * (B1 + x^2 * (B2 + x^2 * (B3 + x^2 * (B4))));
+        c) Swap RS & RC if if first bit of obtained value after
+           Right Shifting is set to 1. Using And, Andnot & Or operations.
+     3) Destination sign setting
+        a) Set shifted destination sign using XOR operation:
+           R1 = XOR( RS, SS );
+           R2 = XOR( RC, SC ).  */
+
+        pushq     %rbp
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%rbp, 0)
+        movq      %rsp, %rbp
+        cfi_def_cfa_register (%rbp)
+        andq      $-64, %rsp
+        subq      $320, %rsp
+        movq      __svml_ssincos_data@GOTPCREL(%rip), %rax
+        movups    %xmm12, 176(%rsp)
+        movups    %xmm9, 160(%rsp)
+        movups __sAbsMask(%rax), %xmm12
+
+/* Absolute argument computation */
+        movaps    %xmm12, %xmm5
+        andnps    %xmm0, %xmm12
+        movups __sInvPI(%rax), %xmm7
+        andps     %xmm0, %xmm5
+
+/* c) Getting octant Y by 2/Pi multiplication
+   d) Add "Right Shifter" value.  */
+        mulps     %xmm5, %xmm7
+        movups    %xmm10, 144(%rsp)
+        movups __sPI1(%rax), %xmm10
+
+/* h) Subtract Y*(PI/2) from X argument, where PI/2 divided to 3 parts:
+      X = X - Y*PI1 - Y*PI2 - Y*PI3.  */
+        movaps    %xmm10, %xmm1
+        addps __sRShifter(%rax), %xmm7
+
+/* e) Treat obtained value as integer S for destination sign setting */
+        movaps    %xmm7, %xmm9
+
+/* g) Subtract "Right Shifter" (0x4B000000) value */
+        subps __sRShifter(%rax), %xmm7
+        mulps     %xmm7, %xmm1
+        pslld     $31, %xmm9
+        movups __sPI2(%rax), %xmm6
+        movups    %xmm13, 112(%rsp)
+        movaps    %xmm5, %xmm13
+        movaps    %xmm6, %xmm2
+        subps     %xmm1, %xmm13
+        mulps     %xmm7, %xmm2
+        movups __sSignMask(%rax), %xmm3
+        movaps    %xmm5, %xmm1
+        movups __sOneHalf(%rax), %xmm4
+        subps     %xmm2, %xmm13
+        cmpnleps __sRangeReductionVal(%rax), %xmm5
+        movaps    %xmm3, %xmm2
+        andps     %xmm13, %xmm2
+        xorps     %xmm2, %xmm4
+
+/* Result sign calculations */
+        xorps     %xmm2, %xmm3
+        xorps     %xmm9, %xmm3
+
+/* Add correction term 0.5 for cos() part */
+        addps     %xmm7, %xmm4
+        movmskps  %xmm5, %ecx
+        mulps     %xmm4, %xmm10
+        mulps     %xmm4, %xmm6
+        subps     %xmm10, %xmm1
+        movups __sPI3(%rax), %xmm10
+        subps     %xmm6, %xmm1
+        movaps    %xmm10, %xmm6
+        mulps     %xmm7, %xmm6
+        mulps     %xmm4, %xmm10
+        subps     %xmm6, %xmm13
+        subps     %xmm10, %xmm1
+        movups __sPI4(%rax), %xmm6
+        mulps     %xmm6, %xmm7
+        mulps     %xmm6, %xmm4
+        subps     %xmm7, %xmm13
+        subps     %xmm4, %xmm1
+        xorps     %xmm9, %xmm13
+        xorps     %xmm3, %xmm1
+        movaps    %xmm13, %xmm4
+        movaps    %xmm1, %xmm2
+        mulps     %xmm13, %xmm4
+        mulps     %xmm1, %xmm2
+        movups __sA9(%rax), %xmm7
+
+/* 2) Polynomial (minimax for sin within  [-Pi/4; +Pi/4] interval)
+      a) Calculate X^2 = X * X
+      b) Calculate 2 polynomials for sin and cos:
+         RS = X * ( A0 + X^2 * (A1 + x^2 * (A2 + x^2 * (A3))));
+         RC = B0 + X^2 * (B1 + x^2 * (B2 + x^2 * (B3 + x^2 * (B4)))) */
+        movaps    %xmm7, %xmm3
+        mulps     %xmm4, %xmm3
+        mulps     %xmm2, %xmm7
+        addps __sA7(%rax), %xmm3
+        addps __sA7(%rax), %xmm7
+        mulps     %xmm4, %xmm3
+        mulps     %xmm2, %xmm7
+        addps __sA5(%rax), %xmm3
+        addps __sA5(%rax), %xmm7
+        mulps     %xmm4, %xmm3
+        mulps     %xmm2, %xmm7
+        addps __sA3(%rax), %xmm3
+        addps __sA3(%rax), %xmm7
+        mulps     %xmm3, %xmm4
+        mulps     %xmm7, %xmm2
+        mulps     %xmm13, %xmm4
+        mulps     %xmm1, %xmm2
+        addps     %xmm4, %xmm13
+        addps     %xmm2, %xmm1
+        xorps     %xmm12, %xmm13
+        testl     %ecx, %ecx
+        jne       .LBL_1_3
+
+.LBL_1_2:
+        cfi_remember_state
+        movups    160(%rsp), %xmm9
+        movaps    %xmm13, (%rdi)
+        movups    144(%rsp), %xmm10
+        movups    176(%rsp), %xmm12
+        movups    112(%rsp), %xmm13
+        movups    %xmm1, (%rsi)
+        movq      %rbp, %rsp
+        cfi_def_cfa_register (%rsp)
+        popq      %rbp
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%rbp)
+        ret
+
+.LBL_1_3:
+        cfi_restore_state
+        movups    %xmm0, 128(%rsp)
+        movups    %xmm13, 192(%rsp)
+        movups    %xmm1, 256(%rsp)
+        je        .LBL_1_2
+
+        xorb      %dl, %dl
+        xorl      %eax, %eax
+        movups    %xmm8, 48(%rsp)
+        movups    %xmm11, 32(%rsp)
+        movups    %xmm14, 16(%rsp)
+        movups    %xmm15, (%rsp)
+        movq      %rsi, 64(%rsp)
+        movq      %r12, 104(%rsp)
+        cfi_offset_rel_rsp (12, 104)
+        movb      %dl, %r12b
+        movq      %r13, 96(%rsp)
+        cfi_offset_rel_rsp (13, 96)
+        movl      %eax, %r13d
+        movq      %r14, 88(%rsp)
+        cfi_offset_rel_rsp (14, 88)
+        movl      %ecx, %r14d
+        movq      %r15, 80(%rsp)
+        cfi_offset_rel_rsp (15, 80)
+        movq      %rbx, 72(%rsp)
+        movq      %rdi, %rbx
+        cfi_remember_state
+
+.LBL_1_6:
+        btl       %r13d, %r14d
+        jc        .LBL_1_13
+
+.LBL_1_7:
+        lea       1(%r13), %esi
+        btl       %esi, %r14d
+        jc        .LBL_1_10
+
+.LBL_1_8:
+        incb      %r12b
+        addl      $2, %r13d
+        cmpb      $16, %r12b
+        jb        .LBL_1_6
+
+        movups    48(%rsp), %xmm8
+        movq      %rbx, %rdi
+        movups    32(%rsp), %xmm11
+        movups    16(%rsp), %xmm14
+        movups    (%rsp), %xmm15
+        movq      64(%rsp), %rsi
+        movq      104(%rsp), %r12
+        cfi_restore (%r12)
+        movq      96(%rsp), %r13
+        cfi_restore (%r13)
+        movq      88(%rsp), %r14
+        cfi_restore (%r14)
+        movq      80(%rsp), %r15
+        cfi_restore (%r15)
+        movq      72(%rsp), %rbx
+        movups    192(%rsp), %xmm13
+        movups    256(%rsp), %xmm1
+        jmp       .LBL_1_2
+
+.LBL_1_10:
+        cfi_restore_state
+        movzbl    %r12b, %r15d
+        movss     132(%rsp,%r15,8), %xmm0
+
+        call      sinf@PLT
+
+        movss     %xmm0, 196(%rsp,%r15,8)
+        movss     132(%rsp,%r15,8), %xmm0
+
+        call      cosf@PLT
+
+        movss     %xmm0, 260(%rsp,%r15,8)
+        jmp       .LBL_1_8
+
+.LBL_1_13:
+        movzbl    %r12b, %r15d
+        movss     128(%rsp,%r15,8), %xmm0
+
+        call      sinf@PLT
+
+        movss     %xmm0, 192(%rsp,%r15,8)
+        movss     128(%rsp,%r15,8), %xmm0
+
+        call      cosf@PLT
+
+        movss     %xmm0, 256(%rsp,%r15,8)
+        jmp       .LBL_1_7
+
+END (_ZGVbN4vvv_sincosf_sse4)
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S
new file mode 100644
index 0000000..9e5be67
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S
@@ -0,0 +1,38 @@
+/* Multiple versions of vectorized sincosf.
+   Copyright (C) 2014-2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+    Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <init-arch.h>
+
+	.text
+ENTRY (_ZGVdN8vvv_sincosf)
+        .type   _ZGVdN8vvv_sincosf, @gnu_indirect_function
+        cmpl    $0, KIND_OFFSET+__cpu_features(%rip)
+        jne     1f
+        call    __init_cpu_features
+1:      leaq    _ZGVdN8vvv_sincosf_avx2(%rip), %rax
+        testl   $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+        jz      2f
+        ret
+2:      leaq    _ZGVdN8vvv_sincosf_sse_wrapper(%rip), %rax
+        ret
+END (_ZGVdN8vvv_sincosf)
+libmvec_hidden_def (_ZGVdN8vvv_sincosf)
+
+#define _ZGVdN8vvv_sincosf _ZGVdN8vvv_sincosf_sse_wrapper
+#include "../svml_s_sincosf8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S
new file mode 100644
index 0000000..153c315
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S
@@ -0,0 +1,241 @@
+/* Function sincosf vectorized with AVX2.
+   Copyright (C) 2014-2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include "svml_s_sincosf_data.h"
+
+	.text
+ENTRY(_ZGVdN8vvv_sincosf_avx2)
+/*
+   ALGORITHM DESCRIPTION:
+
+     1) Range reduction to [-Pi/4; +Pi/4] interval
+        a) Grab sign from source argument and save it.
+        b) Remove sign using AND operation
+        c) Getting octant Y by 2/Pi multiplication
+        d) Add "Right Shifter" value
+        e) Treat obtained value as integer S for destination sign setting.
+           SS = ((S-S&1)&2)<<30; For sin part
+           SC = ((S+S&1)&2)<<30; For cos part
+        f) Change destination sign if source sign is negative
+           using XOR operation.
+        g) Subtract "Right Shifter" (0x4B000000) value
+        h) Subtract Y*(PI/2) from X argument, where PI/2 divided to 4 parts:
+           X = X - Y*PI1 - Y*PI2 - Y*PI3 - Y*PI4;
+     2) Polynomial (minimax for sin within  [-Pi/4; +Pi/4] interval)
+        a) Calculate X^2 = X * X
+        b) Calculate 2 polynomials for sin and cos:
+           RS = X * ( A0 + X^2 * (A1 + x^2 * (A2 + x^2 * (A3))));
+           RC = B0 + X^2 * (B1 + x^2 * (B2 + x^2 * (B3 + x^2 * (B4))));
+        c) Swap RS & RC if if first bit of obtained value after
+           Right Shifting is set to 1. Using And, Andnot & Or operations.
+     3) Destination sign setting
+        a) Set shifted destination sign using XOR operation:
+           R1 = XOR( RS, SS );
+           R2 = XOR( RC, SC ).  */
+
+        pushq     %rbp
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%rbp, 0)
+        movq      %rsp, %rbp
+        cfi_def_cfa_register (%rbp)
+        andq      $-64, %rsp
+        subq      $448, %rsp
+        movq      __svml_ssincos_data@GOTPCREL(%rip), %rax
+        vmovdqa   %ymm0, %ymm5
+        vmovups   %ymm13, 352(%rsp)
+        vmovups __sAbsMask(%rax), %ymm2
+        vmovups __sInvPI(%rax), %ymm1
+        vmovups __sPI1_FMA(%rax), %ymm13
+        vmovups   %ymm15, 288(%rsp)
+
+/* Absolute argument computation */
+        vandps    %ymm2, %ymm5, %ymm4
+
+/* c) Getting octant Y by 2/Pi multiplication
+   d) Add "Right Shifter" value */
+        vfmadd213ps __sRShifter(%rax), %ymm4, %ymm1
+
+/* e) Treat obtained value as integer S for destination sign setting */
+        vpslld    $31, %ymm1, %ymm0
+
+/* g) Subtract "Right Shifter" (0x4B000000) value */
+        vsubps __sRShifter(%rax), %ymm1, %ymm1
+
+/* h) Subtract Y*(PI/2) from X argument, where PI/2 divided to 3 parts:
+      X = X - Y*PI1 - Y*PI2 - Y*PI3 */
+        vmovdqa   %ymm4, %ymm7
+        vfnmadd231ps %ymm1, %ymm13, %ymm7
+        vfnmadd231ps __sPI2_FMA(%rax), %ymm1, %ymm7
+        vandps __sSignMask(%rax), %ymm7, %ymm15
+        vxorps __sOneHalf(%rax), %ymm15, %ymm6
+
+/* Add correction term 0.5 for cos() part */
+        vaddps    %ymm6, %ymm1, %ymm6
+        vmovdqa   %ymm4, %ymm3
+        vfnmadd231ps %ymm6, %ymm13, %ymm3
+        vmovups __sPI3_FMA(%rax), %ymm13
+        vcmpnle_uqps __sRangeReductionVal(%rax), %ymm4, %ymm4
+        vfnmadd231ps __sPI2_FMA(%rax), %ymm6, %ymm3
+        vfnmadd213ps %ymm7, %ymm13, %ymm1
+        vfnmadd213ps %ymm3, %ymm13, %ymm6
+
+/* Result sign calculations */
+        vxorps __sSignMask(%rax), %ymm15, %ymm3
+        vxorps    %ymm0, %ymm3, %ymm7
+        vxorps    %ymm7, %ymm6, %ymm3
+        vxorps    %ymm0, %ymm1, %ymm15
+        vandnps   %ymm5, %ymm2, %ymm6
+        vmovups __sA7_FMA(%rax), %ymm2
+        vmulps    %ymm15, %ymm15, %ymm13
+        vmovups __sA9_FMA(%rax), %ymm7
+        vmulps    %ymm3, %ymm3, %ymm1
+
+/* 2) Polynomial (minimax for sin within  [-Pi/4; +Pi/4] interval)
+      a) Calculate X^2 = X * X
+      b) Calculate 2 polynomials for sin and cos:
+         RS = X * ( A0 + X^2 * (A1 + x^2 * (A2 + x^2 * (A3))));
+         RC = B0 + X^2 * (B1 + x^2 * (B2 + x^2 * (B3 + x^2 * (B4)))) */
+        vmovdqa   %ymm2, %ymm0
+        vfmadd231ps __sA9_FMA(%rax), %ymm13, %ymm0
+        vfmadd213ps %ymm2, %ymm1, %ymm7
+        vfmadd213ps __sA5_FMA(%rax), %ymm13, %ymm0
+        vfmadd213ps __sA5_FMA(%rax), %ymm1, %ymm7
+        vfmadd213ps __sA3(%rax), %ymm13, %ymm0
+        vfmadd213ps __sA3(%rax), %ymm1, %ymm7
+        vmulps    %ymm13, %ymm0, %ymm13
+        vmulps    %ymm1, %ymm7, %ymm1
+        vfmadd213ps %ymm15, %ymm15, %ymm13
+        vfmadd213ps %ymm3, %ymm3, %ymm1
+        vmovmskps %ymm4, %ecx
+        vxorps    %ymm6, %ymm13, %ymm0
+        testl     %ecx, %ecx
+        jne       .LBL_1_3
+
+.LBL_1_2:
+        cfi_remember_state
+        vmovups   352(%rsp), %ymm13
+        vmovups   288(%rsp), %ymm15
+        vmovups   %ymm0, (%rdi)
+        vmovups   %ymm1, (%rsi)
+        movq      %rbp, %rsp
+        cfi_def_cfa_register (%rsp)
+        popq      %rbp
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%rbp)
+        ret
+
+.LBL_1_3:
+        cfi_restore_state
+        vmovups   %ymm5, 256(%rsp)
+        vmovups   %ymm0, 320(%rsp)
+        vmovups   %ymm1, 384(%rsp)
+        je        .LBL_1_2
+
+        xorb      %dl, %dl
+        xorl      %eax, %eax
+        vmovups   %ymm8, 160(%rsp)
+        vmovups   %ymm9, 128(%rsp)
+        vmovups   %ymm10, 96(%rsp)
+        vmovups   %ymm11, 64(%rsp)
+        vmovups   %ymm12, 32(%rsp)
+        vmovups   %ymm14, (%rsp)
+        movq      %rsi, 192(%rsp)
+        movq      %r12, 232(%rsp)
+        cfi_offset_rel_rsp (12, 232)
+        movb      %dl, %r12b
+        movq      %r13, 224(%rsp)
+        cfi_offset_rel_rsp (13, 224)
+        movl      %eax, %r13d
+        movq      %r14, 216(%rsp)
+        cfi_offset_rel_rsp (14, 216)
+        movl      %ecx, %r14d
+        movq      %r15, 208(%rsp)
+        cfi_offset_rel_rsp (14, 208)
+        movq      %rbx, 200(%rsp)
+        movq      %rdi, %rbx
+        cfi_remember_state
+
+.LBL_1_6:
+        btl       %r13d, %r14d
+        jc        .LBL_1_13
+
+.LBL_1_7:
+        lea       1(%r13), %esi
+        btl       %esi, %r14d
+        jc        .LBL_1_10
+
+.LBL_1_8:
+        incb      %r12b
+        addl      $2, %r13d
+        cmpb      $16, %r12b
+        jb        .LBL_1_6
+
+        vmovups   160(%rsp), %ymm8
+        movq      %rbx, %rdi
+        vmovups   128(%rsp), %ymm9
+        vmovups   96(%rsp), %ymm10
+        vmovups   64(%rsp), %ymm11
+        vmovups   32(%rsp), %ymm12
+        vmovups   (%rsp), %ymm14
+        vmovups   320(%rsp), %ymm0
+        vmovups   384(%rsp), %ymm1
+        movq      192(%rsp), %rsi
+        movq      232(%rsp), %r12
+        cfi_restore (%r12)
+        movq      224(%rsp), %r13
+        cfi_restore (%r13)
+        movq      216(%rsp), %r14
+        cfi_restore (%r14)
+        movq      208(%rsp), %r15
+        cfi_restore (%r15)
+        movq      200(%rsp), %rbx
+        jmp       .LBL_1_2
+
+.LBL_1_10:
+        cfi_restore_state
+        movzbl    %r12b, %r15d
+        vmovss    260(%rsp,%r15,8), %xmm0
+        vzeroupper
+
+        call      sinf@PLT
+
+        vmovss    %xmm0, 324(%rsp,%r15,8)
+        vmovss    260(%rsp,%r15,8), %xmm0
+
+        call      cosf@PLT
+
+        vmovss    %xmm0, 388(%rsp,%r15,8)
+        jmp       .LBL_1_8
+
+.LBL_1_13:
+        movzbl    %r12b, %r15d
+        vmovss    256(%rsp,%r15,8), %xmm0
+        vzeroupper
+
+        call      sinf@PLT
+
+        vmovss    %xmm0, 320(%rsp,%r15,8)
+        vmovss    256(%rsp,%r15,8), %xmm0
+
+        call      cosf@PLT
+
+        vmovss    %xmm0, 384(%rsp,%r15,8)
+        jmp       .LBL_1_7
+
+END(_ZGVdN8vvv_sincosf_avx2)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4.c b/sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
similarity index 76%
copy from sysdeps/x86_64/fpu/test-float-vlen4.c
copy to sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
index 3e74118..992f9a9 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4.c
+++ b/sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
@@ -1,4 +1,4 @@
-/* Tests for SSE ISA versions of vector math functions.
+/* Function sincosf vectorized with AVX-512. Wrapper to AVX2 version.
    Copyright (C) 2014-2015 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,12 +16,10 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include "test-float-vlen4.h"
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
 
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
-#define TEST_VECTOR_expf 1
-#define TEST_VECTOR_powf 1
-
-#include "libm-test.c"
+	.text
+ENTRY (_ZGVeN16vvv_sincosf)
+WRAPPER_IMPL_AVX512_fFF _ZGVdN8vvv_sincosf
+END (_ZGVeN16vvv_sincosf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4.c b/sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
similarity index 75%
copy from sysdeps/x86_64/fpu/test-float-vlen4.c
copy to sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
index 3e74118..d402ffb 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4.c
+++ b/sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
@@ -1,4 +1,4 @@
-/* Tests for SSE ISA versions of vector math functions.
+/* Function sincosf vectorized with SSE2.
    Copyright (C) 2014-2015 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,12 +16,15 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include "test-float-vlen4.h"
 
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
-#define TEST_VECTOR_expf 1
-#define TEST_VECTOR_powf 1
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
 
-#include "libm-test.c"
+	.text
+ENTRY (_ZGVbN4vvv_sincosf)
+WRAPPER_IMPL_SSE2_fFF sincosf
+END (_ZGVbN4vvv_sincosf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4vvv_sincosf)
+#endif
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4.c b/sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
similarity index 73%
copy from sysdeps/x86_64/fpu/test-float-vlen4.c
copy to sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
index 3e74118..eec7de8 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4.c
+++ b/sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
@@ -1,4 +1,4 @@
-/* Tests for SSE ISA versions of vector math functions.
+/* Function sincosf vectorized with AVX2, wrapper version.
    Copyright (C) 2014-2015 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,12 +16,14 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include "test-float-vlen4.h"
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
 
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
-#define TEST_VECTOR_expf 1
-#define TEST_VECTOR_powf 1
+	.text
+ENTRY (_ZGVdN8vvv_sincosf)
+WRAPPER_IMPL_AVX_fFF _ZGVbN4vvv_sincosf
+END (_ZGVdN8vvv_sincosf)
 
-#include "libm-test.c"
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8vvv_sincosf)
+#endif
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4.c b/sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
similarity index 76%
copy from sysdeps/x86_64/fpu/test-float-vlen4.c
copy to sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
index 3e74118..c247444 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4.c
+++ b/sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
@@ -1,4 +1,4 @@
-/* Tests for SSE ISA versions of vector math functions.
+/* Function sincosf vectorized in AVX ISA as wrapper to SSE4 ISA version.
    Copyright (C) 2014-2015 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,12 +16,10 @@
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
-#include "test-float-vlen4.h"
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
 
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
-#define TEST_VECTOR_expf 1
-#define TEST_VECTOR_powf 1
-
-#include "libm-test.c"
+        .text
+ENTRY(_ZGVcN8vvv_sincosf)
+WRAPPER_IMPL_AVX_fFF _ZGVbN4vvv_sincosf
+END(_ZGVcN8vvv_sincosf)
diff --git a/sysdeps/x86_64/fpu/svml_s_sincosf_data.S b/sysdeps/x86_64/fpu/svml_s_sincosf_data.S
new file mode 100644
index 0000000..040414d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_sincosf_data.S
@@ -0,0 +1,1140 @@
+/* Data for function sincosf.
+   Copyright (C) 2014-2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "svml_s_sincosf_data.h"
+
+	.section .rodata, "a"
+	.align 64
+	.align 64
+
+/* Data table for vector implementations of function sincosf.
+   The table may contain polynomial, reduction, lookup coefficients
+   and other coefficients obtained through different methods of research
+   and experimental work.  */
+
+	.globl __svml_ssincos_data
+__svml_ssincos_data:
+
+/* Lookup table for high accuracy version (CHL,SHi,SLo,Sigma) */
+.if .-__svml_ssincos_data != __dT
+.err
+.endif
+	.long	0x00000000
+	.long	0x00000000
+	.long	0x00000000
+	.long	0x3f800000
+	.long	0xb99de7df
+	.long	0x3cc90ab0
+	.long	0xb005c998
+	.long	0x3f800000
+	.long	0xba9de1c8
+	.long	0x3d48fb30
+	.long	0xb0ef227f
+	.long	0x3f800000
+	.long	0xbb319298
+	.long	0x3d96a905
+	.long	0xb1531e61
+	.long	0x3f800000
+	.long	0xbb9dc971
+	.long	0x3dc8bd36
+	.long	0xb07592f5
+	.long	0x3f800000
+	.long	0xbbf66e3c
+	.long	0x3dfab273
+	.long	0xb11568cf
+	.long	0x3f800000
+	.long	0xbc315502
+	.long	0x3e164083
+	.long	0x31e8e614
+	.long	0x3f800000
+	.long	0xbc71360b
+	.long	0x3e2f10a2
+	.long	0x311167f9
+	.long	0x3f800000
+	.long	0xbc9d6830
+	.long	0x3e47c5c2
+	.long	0xb0e5967d
+	.long	0x3f800000
+	.long	0xbcc70c54
+	.long	0x3e605c13
+	.long	0x31a7e4f6
+	.long	0x3f800000
+	.long	0xbcf58104
+	.long	0x3e78cfcc
+	.long	0xb11bd41d
+	.long	0x3f800000
+	.long	0xbd145f8c
+	.long	0x3e888e93
+	.long	0x312c7d9e
+	.long	0x3f800000
+	.long	0xbd305f55
+	.long	0x3e94a031
+	.long	0x326d59f0
+	.long	0x3f800000
+	.long	0xbd4ebb8a
+	.long	0x3ea09ae5
+	.long	0xb23e89a0
+	.long	0x3f800000
+	.long	0xbd6f6f7e
+	.long	0x3eac7cd4
+	.long	0xb2254e02
+	.long	0x3f800000
+	.long	0xbd893b12
+	.long	0x3eb8442a
+	.long	0xb2705ba6
+	.long	0x3f800000
+	.long	0xbd9be50c
+	.long	0x3ec3ef15
+	.long	0x31d5d52c
+	.long	0x3f800000
+	.long	0xbdafb2cc
+	.long	0x3ecf7bca
+	.long	0x316a3b63
+	.long	0x3f800000
+	.long	0xbdc4a143
+	.long	0x3edae880
+	.long	0x321e15cc
+	.long	0x3f800000
+	.long	0xbddaad38
+	.long	0x3ee63375
+	.long	0xb1d9c774
+	.long	0x3f800000
+	.long	0xbdf1d344
+	.long	0x3ef15aea
+	.long	0xb1ff2139
+	.long	0x3f800000
+	.long	0xbe0507ea
+	.long	0x3efc5d27
+	.long	0xb180eca9
+	.long	0x3f800000
+	.long	0xbe11af97
+	.long	0x3f039c3d
+	.long	0xb25ba002
+	.long	0x3f800000
+	.long	0xbe1edeb5
+	.long	0x3f08f59b
+	.long	0xb2be4b4e
+	.long	0x3f800000
+	.long	0xbe2c933b
+	.long	0x3f0e39da
+	.long	0xb24a32e7
+	.long	0x3f800000
+	.long	0xbe3acb0c
+	.long	0x3f13682a
+	.long	0x32cdd12e
+	.long	0x3f800000
+	.long	0xbe4983f7
+	.long	0x3f187fc0
+	.long	0xb1c7a3f3
+	.long	0x3f800000
+	.long	0xbe58bbb7
+	.long	0x3f1d7fd1
+	.long	0x3292050c
+	.long	0x3f800000
+	.long	0xbe686ff3
+	.long	0x3f226799
+	.long	0x322123bb
+	.long	0x3f800000
+	.long	0xbe789e3f
+	.long	0x3f273656
+	.long	0xb2038343
+	.long	0x3f800000
+	.long	0xbe84a20e
+	.long	0x3f2beb4a
+	.long	0xb2b73136
+	.long	0x3f800000
+	.long	0xbe8d2f7d
+	.long	0x3f3085bb
+	.long	0xb2ae2d32
+	.long	0x3f800000
+	.long	0xbe95f61a
+	.long	0x3f3504f3
+	.long	0x324fe77a
+	.long	0x3f800000
+	.long	0x3e4216eb
+	.long	0x3f396842
+	.long	0xb2810007
+	.long	0x3f000000
+	.long	0x3e2fad27
+	.long	0x3f3daef9
+	.long	0x319aabec
+	.long	0x3f000000
+	.long	0x3e1cd957
+	.long	0x3f41d870
+	.long	0x32bff977
+	.long	0x3f000000
+	.long	0x3e099e65
+	.long	0x3f45e403
+	.long	0x32b15174
+	.long	0x3f000000
+	.long	0x3debfe8a
+	.long	0x3f49d112
+	.long	0x32992640
+	.long	0x3f000000
+	.long	0x3dc3fdff
+	.long	0x3f4d9f02
+	.long	0x327e70e8
+	.long	0x3f000000
+	.long	0x3d9b4153
+	.long	0x3f514d3d
+	.long	0x300c4f04
+	.long	0x3f000000
+	.long	0x3d639d9d
+	.long	0x3f54db31
+	.long	0x3290ea1a
+	.long	0x3f000000
+	.long	0x3d0f59aa
+	.long	0x3f584853
+	.long	0xb27d5fc0
+	.long	0x3f000000
+	.long	0x3c670f32
+	.long	0x3f5b941a
+	.long	0x32232dc8
+	.long	0x3f000000
+	.long	0xbbe8b648
+	.long	0x3f5ebe05
+	.long	0x32c6f953
+	.long	0x3f000000
+	.long	0xbcea5164
+	.long	0x3f61c598
+	.long	0xb2e7f425
+	.long	0x3f000000
+	.long	0xbd4e645a
+	.long	0x3f64aa59
+	.long	0x311a08fa
+	.long	0x3f000000
+	.long	0xbd945dff
+	.long	0x3f676bd8
+	.long	0xb2bc3389
+	.long	0x3f000000
+	.long	0xbdc210d8
+	.long	0x3f6a09a7
+	.long	0xb2eb236c
+	.long	0x3f000000
+	.long	0xbdf043ab
+	.long	0x3f6c835e
+	.long	0x32f328d4
+	.long	0x3f000000
+	.long	0xbe0f77ad
+	.long	0x3f6ed89e
+	.long	0xb29333dc
+	.long	0x3f000000
+	.long	0x3db1f34f
+	.long	0x3f710908
+	.long	0x321ed0dd
+	.long	0x3e800000
+	.long	0x3d826b93
+	.long	0x3f731447
+	.long	0x32c48e11
+	.long	0x3e800000
+	.long	0x3d25018c
+	.long	0x3f74fa0b
+	.long	0xb2939d22
+	.long	0x3e800000
+	.long	0x3c88e931
+	.long	0x3f76ba07
+	.long	0x326d092c
+	.long	0x3e800000
+	.long	0xbbe60685
+	.long	0x3f7853f8
+	.long	0xb20db9e5
+	.long	0x3e800000
+	.long	0xbcfd1f65
+	.long	0x3f79c79d
+	.long	0x32c64e59
+	.long	0x3e800000
+	.long	0xbd60e8f8
+	.long	0x3f7b14be
+	.long	0x32ff75cb
+	.long	0x3e800000
+	.long	0x3d3c4289
+	.long	0x3f7c3b28
+	.long	0xb231d68b
+	.long	0x3e000000
+	.long	0x3cb2041c
+	.long	0x3f7d3aac
+	.long	0xb0f75ae9
+	.long	0x3e000000
+	.long	0xbb29b1a9
+	.long	0x3f7e1324
+	.long	0xb2f1e603
+	.long	0x3e000000
+	.long	0xbcdd0b28
+	.long	0x3f7ec46d
+	.long	0x31f44949
+	.long	0x3e000000
+	.long	0x3c354825
+	.long	0x3f7f4e6d
+	.long	0x32d01884
+	.long	0x3d800000
+	.long	0xbc5c1342
+	.long	0x3f7fb10f
+	.long	0x31de5b5f
+	.long	0x3d800000
+	.long	0xbbdbd541
+	.long	0x3f7fec43
+	.long	0x3084cd0d
+	.long	0x3d000000
+	.long	0x00000000
+	.long	0x3f800000
+	.long	0x00000000
+	.long	0x00000000
+	.long	0x3bdbd541
+	.long	0x3f7fec43
+	.long	0x3084cd0d
+	.long	0xbd000000
+	.long	0x3c5c1342
+	.long	0x3f7fb10f
+	.long	0x31de5b5f
+	.long	0xbd800000
+	.long	0xbc354825
+	.long	0x3f7f4e6d
+	.long	0x32d01884
+	.long	0xbd800000
+	.long	0x3cdd0b28
+	.long	0x3f7ec46d
+	.long	0x31f44949
+	.long	0xbe000000
+	.long	0x3b29b1a9
+	.long	0x3f7e1324
+	.long	0xb2f1e603
+	.long	0xbe000000
+	.long	0xbcb2041c
+	.long	0x3f7d3aac
+	.long	0xb0f75ae9
+	.long	0xbe000000
+	.long	0xbd3c4289
+	.long	0x3f7c3b28
+	.long	0xb231d68b
+	.long	0xbe000000
+	.long	0x3d60e8f8
+	.long	0x3f7b14be
+	.long	0x32ff75cb
+	.long	0xbe800000
+	.long	0x3cfd1f65
+	.long	0x3f79c79d
+	.long	0x32c64e59
+	.long	0xbe800000
+	.long	0x3be60685
+	.long	0x3f7853f8
+	.long	0xb20db9e5
+	.long	0xbe800000
+	.long	0xbc88e931
+	.long	0x3f76ba07
+	.long	0x326d092c
+	.long	0xbe800000
+	.long	0xbd25018c
+	.long	0x3f74fa0b
+	.long	0xb2939d22
+	.long	0xbe800000
+	.long	0xbd826b93
+	.long	0x3f731447
+	.long	0x32c48e11
+	.long	0xbe800000
+	.long	0xbdb1f34f
+	.long	0x3f710908
+	.long	0x321ed0dd
+	.long	0xbe800000
+	.long	0x3e0f77ad
+	.long	0x3f6ed89e
+	.long	0xb29333dc
+	.long	0xbf000000
+	.long	0x3df043ab
+	.long	0x3f6c835e
+	.long	0x32f328d4
+	.long	0xbf000000
+	.long	0x3dc210d8
+	.long	0x3f6a09a7
+	.long	0xb2eb236c
+	.long	0xbf000000
+	.long	0x3d945dff
+	.long	0x3f676bd8
+	.long	0xb2bc3389
+	.long	0xbf000000
+	.long	0x3d4e645a
+	.long	0x3f64aa59
+	.long	0x311a08fa
+	.long	0xbf000000
+	.long	0x3cea5164
+	.long	0x3f61c598
+	.long	0xb2e7f425
+	.long	0xbf000000
+	.long	0x3be8b648
+	.long	0x3f5ebe05
+	.long	0x32c6f953
+	.long	0xbf000000
+	.long	0xbc670f32
+	.long	0x3f5b941a
+	.long	0x32232dc8
+	.long	0xbf000000
+	.long	0xbd0f59aa
+	.long	0x3f584853
+	.long	0xb27d5fc0
+	.long	0xbf000000
+	.long	0xbd639d9d
+	.long	0x3f54db31
+	.long	0x3290ea1a
+	.long	0xbf000000
+	.long	0xbd9b4153
+	.long	0x3f514d3d
+	.long	0x300c4f04
+	.long	0xbf000000
+	.long	0xbdc3fdff
+	.long	0x3f4d9f02
+	.long	0x327e70e8
+	.long	0xbf000000
+	.long	0xbdebfe8a
+	.long	0x3f49d112
+	.long	0x32992640
+	.long	0xbf000000
+	.long	0xbe099e65
+	.long	0x3f45e403
+	.long	0x32b15174
+	.long	0xbf000000
+	.long	0xbe1cd957
+	.long	0x3f41d870
+	.long	0x32bff977
+	.long	0xbf000000
+	.long	0xbe2fad27
+	.long	0x3f3daef9
+	.long	0x319aabec
+	.long	0xbf000000
+	.long	0xbe4216eb
+	.long	0x3f396842
+	.long	0xb2810007
+	.long	0xbf000000
+	.long	0x3e95f61a
+	.long	0x3f3504f3
+	.long	0x324fe77a
+	.long	0xbf800000
+	.long	0x3e8d2f7d
+	.long	0x3f3085bb
+	.long	0xb2ae2d32
+	.long	0xbf800000
+	.long	0x3e84a20e
+	.long	0x3f2beb4a
+	.long	0xb2b73136
+	.long	0xbf800000
+	.long	0x3e789e3f
+	.long	0x3f273656
+	.long	0xb2038343
+	.long	0xbf800000
+	.long	0x3e686ff3
+	.long	0x3f226799
+	.long	0x322123bb
+	.long	0xbf800000
+	.long	0x3e58bbb7
+	.long	0x3f1d7fd1
+	.long	0x3292050c
+	.long	0xbf800000
+	.long	0x3e4983f7
+	.long	0x3f187fc0
+	.long	0xb1c7a3f3
+	.long	0xbf800000
+	.long	0x3e3acb0c
+	.long	0x3f13682a
+	.long	0x32cdd12e
+	.long	0xbf800000
+	.long	0x3e2c933b
+	.long	0x3f0e39da
+	.long	0xb24a32e7
+	.long	0xbf800000
+	.long	0x3e1edeb5
+	.long	0x3f08f59b
+	.long	0xb2be4b4e
+	.long	0xbf800000
+	.long	0x3e11af97
+	.long	0x3f039c3d
+	.long	0xb25ba002
+	.long	0xbf800000
+	.long	0x3e0507ea
+	.long	0x3efc5d27
+	.long	0xb180eca9
+	.long	0xbf800000
+	.long	0x3df1d344
+	.long	0x3ef15aea
+	.long	0xb1ff2139
+	.long	0xbf800000
+	.long	0x3ddaad38
+	.long	0x3ee63375
+	.long	0xb1d9c774
+	.long	0xbf800000
+	.long	0x3dc4a143
+	.long	0x3edae880
+	.long	0x321e15cc
+	.long	0xbf800000
+	.long	0x3dafb2cc
+	.long	0x3ecf7bca
+	.long	0x316a3b63
+	.long	0xbf800000
+	.long	0x3d9be50c
+	.long	0x3ec3ef15
+	.long	0x31d5d52c
+	.long	0xbf800000
+	.long	0x3d893b12
+	.long	0x3eb8442a
+	.long	0xb2705ba6
+	.long	0xbf800000
+	.long	0x3d6f6f7e
+	.long	0x3eac7cd4
+	.long	0xb2254e02
+	.long	0xbf800000
+	.long	0x3d4ebb8a
+	.long	0x3ea09ae5
+	.long	0xb23e89a0
+	.long	0xbf800000
+	.long	0x3d305f55
+	.long	0x3e94a031
+	.long	0x326d59f0
+	.long	0xbf800000
+	.long	0x3d145f8c
+	.long	0x3e888e93
+	.long	0x312c7d9e
+	.long	0xbf800000
+	.long	0x3cf58104
+	.long	0x3e78cfcc
+	.long	0xb11bd41d
+	.long	0xbf800000
+	.long	0x3cc70c54
+	.long	0x3e605c13
+	.long	0x31a7e4f6
+	.long	0xbf800000
+	.long	0x3c9d6830
+	.long	0x3e47c5c2
+	.long	0xb0e5967d
+	.long	0xbf800000
+	.long	0x3c71360b
+	.long	0x3e2f10a2
+	.long	0x311167f9
+	.long	0xbf800000
+	.long	0x3c315502
+	.long	0x3e164083
+	.long	0x31e8e614
+	.long	0xbf800000
+	.long	0x3bf66e3c
+	.long	0x3dfab273
+	.long	0xb11568cf
+	.long	0xbf800000
+	.long	0x3b9dc971
+	.long	0x3dc8bd36
+	.long	0xb07592f5
+	.long	0xbf800000
+	.long	0x3b319298
+	.long	0x3d96a905
+	.long	0xb1531e61
+	.long	0xbf800000
+	.long	0x3a9de1c8
+	.long	0x3d48fb30
+	.long	0xb0ef227f
+	.long	0xbf800000
+	.long	0x399de7df
+	.long	0x3cc90ab0
+	.long	0xb005c998
+	.long	0xbf800000
+	.long	0x00000000
+	.long	0x00000000
+	.long	0x00000000
+	.long	0xbf800000
+	.long	0x399de7df
+	.long	0xbcc90ab0
+	.long	0x3005c998
+	.long	0xbf800000
+	.long	0x3a9de1c8
+	.long	0xbd48fb30
+	.long	0x30ef227f
+	.long	0xbf800000
+	.long	0x3b319298
+	.long	0xbd96a905
+	.long	0x31531e61
+	.long	0xbf800000
+	.long	0x3b9dc971
+	.long	0xbdc8bd36
+	.long	0x307592f5
+	.long	0xbf800000
+	.long	0x3bf66e3c
+	.long	0xbdfab273
+	.long	0x311568cf
+	.long	0xbf800000
+	.long	0x3c315502
+	.long	0xbe164083
+	.long	0xb1e8e614
+	.long	0xbf800000
+	.long	0x3c71360b
+	.long	0xbe2f10a2
+	.long	0xb11167f9
+	.long	0xbf800000
+	.long	0x3c9d6830
+	.long	0xbe47c5c2
+	.long	0x30e5967d
+	.long	0xbf800000
+	.long	0x3cc70c54
+	.long	0xbe605c13
+	.long	0xb1a7e4f6
+	.long	0xbf800000
+	.long	0x3cf58104
+	.long	0xbe78cfcc
+	.long	0x311bd41d
+	.long	0xbf800000
+	.long	0x3d145f8c
+	.long	0xbe888e93
+	.long	0xb12c7d9e
+	.long	0xbf800000
+	.long	0x3d305f55
+	.long	0xbe94a031
+	.long	0xb26d59f0
+	.long	0xbf800000
+	.long	0x3d4ebb8a
+	.long	0xbea09ae5
+	.long	0x323e89a0
+	.long	0xbf800000
+	.long	0x3d6f6f7e
+	.long	0xbeac7cd4
+	.long	0x32254e02
+	.long	0xbf800000
+	.long	0x3d893b12
+	.long	0xbeb8442a
+	.long	0x32705ba6
+	.long	0xbf800000
+	.long	0x3d9be50c
+	.long	0xbec3ef15
+	.long	0xb1d5d52c
+	.long	0xbf800000
+	.long	0x3dafb2cc
+	.long	0xbecf7bca
+	.long	0xb16a3b63
+	.long	0xbf800000
+	.long	0x3dc4a143
+	.long	0xbedae880
+	.long	0xb21e15cc
+	.long	0xbf800000
+	.long	0x3ddaad38
+	.long	0xbee63375
+	.long	0x31d9c774
+	.long	0xbf800000
+	.long	0x3df1d344
+	.long	0xbef15aea
+	.long	0x31ff2139
+	.long	0xbf800000
+	.long	0x3e0507ea
+	.long	0xbefc5d27
+	.long	0x3180eca9
+	.long	0xbf800000
+	.long	0x3e11af97
+	.long	0xbf039c3d
+	.long	0x325ba002
+	.long	0xbf800000
+	.long	0x3e1edeb5
+	.long	0xbf08f59b
+	.long	0x32be4b4e
+	.long	0xbf800000
+	.long	0x3e2c933b
+	.long	0xbf0e39da
+	.long	0x324a32e7
+	.long	0xbf800000
+	.long	0x3e3acb0c
+	.long	0xbf13682a
+	.long	0xb2cdd12e
+	.long	0xbf800000
+	.long	0x3e4983f7
+	.long	0xbf187fc0
+	.long	0x31c7a3f3
+	.long	0xbf800000
+	.long	0x3e58bbb7
+	.long	0xbf1d7fd1
+	.long	0xb292050c
+	.long	0xbf800000
+	.long	0x3e686ff3
+	.long	0xbf226799
+	.long	0xb22123bb
+	.long	0xbf800000
+	.long	0x3e789e3f
+	.long	0xbf273656
+	.long	0x32038343
+	.long	0xbf800000
+	.long	0x3e84a20e
+	.long	0xbf2beb4a
+	.long	0x32b73136
+	.long	0xbf800000
+	.long	0x3e8d2f7d
+	.long	0xbf3085bb
+	.long	0x32ae2d32
+	.long	0xbf800000
+	.long	0x3e95f61a
+	.long	0xbf3504f3
+	.long	0xb24fe77a
+	.long	0xbf800000
+	.long	0xbe4216eb
+	.long	0xbf396842
+	.long	0x32810007
+	.long	0xbf000000
+	.long	0xbe2fad27
+	.long	0xbf3daef9
+	.long	0xb19aabec
+	.long	0xbf000000
+	.long	0xbe1cd957
+	.long	0xbf41d870
+	.long	0xb2bff977
+	.long	0xbf000000
+	.long	0xbe099e65
+	.long	0xbf45e403
+	.long	0xb2b15174
+	.long	0xbf000000
+	.long	0xbdebfe8a
+	.long	0xbf49d112
+	.long	0xb2992640
+	.long	0xbf000000
+	.long	0xbdc3fdff
+	.long	0xbf4d9f02
+	.long	0xb27e70e8
+	.long	0xbf000000
+	.long	0xbd9b4153
+	.long	0xbf514d3d
+	.long	0xb00c4f04
+	.long	0xbf000000
+	.long	0xbd639d9d
+	.long	0xbf54db31
+	.long	0xb290ea1a
+	.long	0xbf000000
+	.long	0xbd0f59aa
+	.long	0xbf584853
+	.long	0x327d5fc0
+	.long	0xbf000000
+	.long	0xbc670f32
+	.long	0xbf5b941a
+	.long	0xb2232dc8
+	.long	0xbf000000
+	.long	0x3be8b648
+	.long	0xbf5ebe05
+	.long	0xb2c6f953
+	.long	0xbf000000
+	.long	0x3cea5164
+	.long	0xbf61c598
+	.long	0x32e7f425
+	.long	0xbf000000
+	.long	0x3d4e645a
+	.long	0xbf64aa59
+	.long	0xb11a08fa
+	.long	0xbf000000
+	.long	0x3d945dff
+	.long	0xbf676bd8
+	.long	0x32bc3389
+	.long	0xbf000000
+	.long	0x3dc210d8
+	.long	0xbf6a09a7
+	.long	0x32eb236c
+	.long	0xbf000000
+	.long	0x3df043ab
+	.long	0xbf6c835e
+	.long	0xb2f328d4
+	.long	0xbf000000
+	.long	0x3e0f77ad
+	.long	0xbf6ed89e
+	.long	0x329333dc
+	.long	0xbf000000
+	.long	0xbdb1f34f
+	.long	0xbf710908
+	.long	0xb21ed0dd
+	.long	0xbe800000
+	.long	0xbd826b93
+	.long	0xbf731447
+	.long	0xb2c48e11
+	.long	0xbe800000
+	.long	0xbd25018c
+	.long	0xbf74fa0b
+	.long	0x32939d22
+	.long	0xbe800000
+	.long	0xbc88e931
+	.long	0xbf76ba07
+	.long	0xb26d092c
+	.long	0xbe800000
+	.long	0x3be60685
+	.long	0xbf7853f8
+	.long	0x320db9e5
+	.long	0xbe800000
+	.long	0x3cfd1f65
+	.long	0xbf79c79d
+	.long	0xb2c64e59
+	.long	0xbe800000
+	.long	0x3d60e8f8
+	.long	0xbf7b14be
+	.long	0xb2ff75cb
+	.long	0xbe800000
+	.long	0xbd3c4289
+	.long	0xbf7c3b28
+	.long	0x3231d68b
+	.long	0xbe000000
+	.long	0xbcb2041c
+	.long	0xbf7d3aac
+	.long	0x30f75ae9
+	.long	0xbe000000
+	.long	0x3b29b1a9
+	.long	0xbf7e1324
+	.long	0x32f1e603
+	.long	0xbe000000
+	.long	0x3cdd0b28
+	.long	0xbf7ec46d
+	.long	0xb1f44949
+	.long	0xbe000000
+	.long	0xbc354825
+	.long	0xbf7f4e6d
+	.long	0xb2d01884
+	.long	0xbd800000
+	.long	0x3c5c1342
+	.long	0xbf7fb10f
+	.long	0xb1de5b5f
+	.long	0xbd800000
+	.long	0x3bdbd541
+	.long	0xbf7fec43
+	.long	0xb084cd0d
+	.long	0xbd000000
+	.long	0x00000000
+	.long	0xbf800000
+	.long	0x00000000
+	.long	0x00000000
+	.long	0xbbdbd541
+	.long	0xbf7fec43
+	.long	0xb084cd0d
+	.long	0x3d000000
+	.long	0xbc5c1342
+	.long	0xbf7fb10f
+	.long	0xb1de5b5f
+	.long	0x3d800000
+	.long	0x3c354825
+	.long	0xbf7f4e6d
+	.long	0xb2d01884
+	.long	0x3d800000
+	.long	0xbcdd0b28
+	.long	0xbf7ec46d
+	.long	0xb1f44949
+	.long	0x3e000000
+	.long	0xbb29b1a9
+	.long	0xbf7e1324
+	.long	0x32f1e603
+	.long	0x3e000000
+	.long	0x3cb2041c
+	.long	0xbf7d3aac
+	.long	0x30f75ae9
+	.long	0x3e000000
+	.long	0x3d3c4289
+	.long	0xbf7c3b28
+	.long	0x3231d68b
+	.long	0x3e000000
+	.long	0xbd60e8f8
+	.long	0xbf7b14be
+	.long	0xb2ff75cb
+	.long	0x3e800000
+	.long	0xbcfd1f65
+	.long	0xbf79c79d
+	.long	0xb2c64e59
+	.long	0x3e800000
+	.long	0xbbe60685
+	.long	0xbf7853f8
+	.long	0x320db9e5
+	.long	0x3e800000
+	.long	0x3c88e931
+	.long	0xbf76ba07
+	.long	0xb26d092c
+	.long	0x3e800000
+	.long	0x3d25018c
+	.long	0xbf74fa0b
+	.long	0x32939d22
+	.long	0x3e800000
+	.long	0x3d826b93
+	.long	0xbf731447
+	.long	0xb2c48e11
+	.long	0x3e800000
+	.long	0x3db1f34f
+	.long	0xbf710908
+	.long	0xb21ed0dd
+	.long	0x3e800000
+	.long	0xbe0f77ad
+	.long	0xbf6ed89e
+	.long	0x329333dc
+	.long	0x3f000000
+	.long	0xbdf043ab
+	.long	0xbf6c835e
+	.long	0xb2f328d4
+	.long	0x3f000000
+	.long	0xbdc210d8
+	.long	0xbf6a09a7
+	.long	0x32eb236c
+	.long	0x3f000000
+	.long	0xbd945dff
+	.long	0xbf676bd8
+	.long	0x32bc3389
+	.long	0x3f000000
+	.long	0xbd4e645a
+	.long	0xbf64aa59
+	.long	0xb11a08fa
+	.long	0x3f000000
+	.long	0xbcea5164
+	.long	0xbf61c598
+	.long	0x32e7f425
+	.long	0x3f000000
+	.long	0xbbe8b648
+	.long	0xbf5ebe05
+	.long	0xb2c6f953
+	.long	0x3f000000
+	.long	0x3c670f32
+	.long	0xbf5b941a
+	.long	0xb2232dc8
+	.long	0x3f000000
+	.long	0x3d0f59aa
+	.long	0xbf584853
+	.long	0x327d5fc0
+	.long	0x3f000000
+	.long	0x3d639d9d
+	.long	0xbf54db31
+	.long	0xb290ea1a
+	.long	0x3f000000
+	.long	0x3d9b4153
+	.long	0xbf514d3d
+	.long	0xb00c4f04
+	.long	0x3f000000
+	.long	0x3dc3fdff
+	.long	0xbf4d9f02
+	.long	0xb27e70e8
+	.long	0x3f000000
+	.long	0x3debfe8a
+	.long	0xbf49d112
+	.long	0xb2992640
+	.long	0x3f000000
+	.long	0x3e099e65
+	.long	0xbf45e403
+	.long	0xb2b15174
+	.long	0x3f000000
+	.long	0x3e1cd957
+	.long	0xbf41d870
+	.long	0xb2bff977
+	.long	0x3f000000
+	.long	0x3e2fad27
+	.long	0xbf3daef9
+	.long	0xb19aabec
+	.long	0x3f000000
+	.long	0x3e4216eb
+	.long	0xbf396842
+	.long	0x32810007
+	.long	0x3f000000
+	.long	0xbe95f61a
+	.long	0xbf3504f3
+	.long	0xb24fe77a
+	.long	0x3f800000
+	.long	0xbe8d2f7d
+	.long	0xbf3085bb
+	.long	0x32ae2d32
+	.long	0x3f800000
+	.long	0xbe84a20e
+	.long	0xbf2beb4a
+	.long	0x32b73136
+	.long	0x3f800000
+	.long	0xbe789e3f
+	.long	0xbf273656
+	.long	0x32038343
+	.long	0x3f800000
+	.long	0xbe686ff3
+	.long	0xbf226799
+	.long	0xb22123bb
+	.long	0x3f800000
+	.long	0xbe58bbb7
+	.long	0xbf1d7fd1
+	.long	0xb292050c
+	.long	0x3f800000
+	.long	0xbe4983f7
+	.long	0xbf187fc0
+	.long	0x31c7a3f3
+	.long	0x3f800000
+	.long	0xbe3acb0c
+	.long	0xbf13682a
+	.long	0xb2cdd12e
+	.long	0x3f800000
+	.long	0xbe2c933b
+	.long	0xbf0e39da
+	.long	0x324a32e7
+	.long	0x3f800000
+	.long	0xbe1edeb5
+	.long	0xbf08f59b
+	.long	0x32be4b4e
+	.long	0x3f800000
+	.long	0xbe11af97
+	.long	0xbf039c3d
+	.long	0x325ba002
+	.long	0x3f800000
+	.long	0xbe0507ea
+	.long	0xbefc5d27
+	.long	0x3180eca9
+	.long	0x3f800000
+	.long	0xbdf1d344
+	.long	0xbef15aea
+	.long	0x31ff2139
+	.long	0x3f800000
+	.long	0xbddaad38
+	.long	0xbee63375
+	.long	0x31d9c774
+	.long	0x3f800000
+	.long	0xbdc4a143
+	.long	0xbedae880
+	.long	0xb21e15cc
+	.long	0x3f800000
+	.long	0xbdafb2cc
+	.long	0xbecf7bca
+	.long	0xb16a3b63
+	.long	0x3f800000
+	.long	0xbd9be50c
+	.long	0xbec3ef15
+	.long	0xb1d5d52c
+	.long	0x3f800000
+	.long	0xbd893b12
+	.long	0xbeb8442a
+	.long	0x32705ba6
+	.long	0x3f800000
+	.long	0xbd6f6f7e
+	.long	0xbeac7cd4
+	.long	0x32254e02
+	.long	0x3f800000
+	.long	0xbd4ebb8a
+	.long	0xbea09ae5
+	.long	0x323e89a0
+	.long	0x3f800000
+	.long	0xbd305f55
+	.long	0xbe94a031
+	.long	0xb26d59f0
+	.long	0x3f800000
+	.long	0xbd145f8c
+	.long	0xbe888e93
+	.long	0xb12c7d9e
+	.long	0x3f800000
+	.long	0xbcf58104
+	.long	0xbe78cfcc
+	.long	0x311bd41d
+	.long	0x3f800000
+	.long	0xbcc70c54
+	.long	0xbe605c13
+	.long	0xb1a7e4f6
+	.long	0x3f800000
+	.long	0xbc9d6830
+	.long	0xbe47c5c2
+	.long	0x30e5967d
+	.long	0x3f800000
+	.long	0xbc71360b
+	.long	0xbe2f10a2
+	.long	0xb11167f9
+	.long	0x3f800000
+	.long	0xbc315502
+	.long	0xbe164083
+	.long	0xb1e8e614
+	.long	0x3f800000
+	.long	0xbbf66e3c
+	.long	0xbdfab273
+	.long	0x311568cf
+	.long	0x3f800000
+	.long	0xbb9dc971
+	.long	0xbdc8bd36
+	.long	0x307592f5
+	.long	0x3f800000
+	.long	0xbb319298
+	.long	0xbd96a905
+	.long	0x31531e61
+	.long	0x3f800000
+	.long	0xba9de1c8
+	.long	0xbd48fb30
+	.long	0x30ef227f
+	.long	0x3f800000
+	.long	0xb99de7df
+	.long	0xbcc90ab0
+	.long	0x3005c998
+	.long	0x3f800000
+
+/* General purpose constants:
+   absolute value mask */
+float_vector __sAbsMask 0x7fffffff
+
+/* threshold for out-of-range values */
+float_vector __sRangeReductionVal 0x461c4000
+
+/* +INF */
+float_vector __sRangeVal 0x7f800000
+
+/* High Accuracy version polynomial coefficients:
+   S1 = -1.66666666664728165763e-01 */
+float_vector __sS1 0xbe2aaaab
+
+/* S2 = 8.33329173045453069014e-03 */
+float_vector __sS2 0x3c08885c
+
+/* C1 = -5.00000000000000000000e-01 */
+float_vector __sC1 0xbf000000
+
+/* C2 = 4.16638942914469202550e-02 */
+float_vector __sC2 0x3d2aaa7c
+
+/* high accuracy table index mask */
+float_vector __iIndexMask 0x000000ff
+
+/* 2^(k-1) */
+float_vector __i2pK_1 0x00000040
+
+/* sign field mask */
+float_vector __sSignMask 0x80000000
+
+/* Range reduction PI-based constants:
+   PI high part */
+float_vector __sPI1 0x40490000
+
+/* PI mid part 1 */
+float_vector __sPI2 0x3a7da000
+
+/* PI mid part 2 */
+float_vector __sPI3 0x34222000
+
+/* PI low part */
+float_vector __sPI4 0x2cb4611a
+
+/* Range reduction PI-based constants if FMA available:
+   PI high part (when FMA available) */
+float_vector __sPI1_FMA 0x40490fdb
+
+/* PI mid  part (when FMA available) */
+float_vector __sPI2_FMA 0xb3bbbd2e
+
+/* PI low  part (when FMA available) */
+float_vector __sPI3_FMA 0xa7772ced
+
+/* Polynomial coefficients: */
+float_vector __sA3 0xbe2aaaa6
+float_vector __sA5 0x3c08876a
+float_vector __sA7 0xb94fb7ff
+float_vector __sA9 0x362edef8
+
+/* Polynomial coefficients (when hardware FMA available) */
+float_vector __sA5_FMA 0x3c088768
+float_vector __sA7_FMA 0xb94fb6cf
+float_vector __sA9_FMA 0x362ec335
+
+/* 1/PI */
+float_vector __sInvPI 0x3ea2f983
+
+/* right-shifter constant */
+float_vector __sRShifter 0x4b400000
+
+/* PI/2 */
+float_vector __sHalfPI 0x3fc90fdb
+
+/* 1/2 */
+float_vector __sOneHalf 0x3f000000
+	.type	__svml_ssincos_data,@object
+	.size __svml_ssincos_data,.-__svml_ssincos_data
diff --git a/sysdeps/x86_64/fpu/svml_s_sincosf_data.h b/sysdeps/x86_64/fpu/svml_s_sincosf_data.h
new file mode 100644
index 0000000..4325117
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_sincosf_data.h
@@ -0,0 +1,61 @@
+/* Offsets for data table for function sincosf.
+   Copyright (C) 2014-2015 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef S_SINCOSF_DATA_H
+#define S_SINCOSF_DATA_H
+
+#define __dT                          	0
+#define __sAbsMask                    	4096
+#define __sRangeReductionVal          	4160
+#define __sRangeVal                   	4224
+#define __sS1                         	4288
+#define __sS2                         	4352
+#define __sC1                         	4416
+#define __sC2                         	4480
+#define __iIndexMask                  	4544
+#define __i2pK_1                      	4608
+#define __sSignMask                   	4672
+#define __sPI1                        	4736
+#define __sPI2                        	4800
+#define __sPI3                        	4864
+#define __sPI4                        	4928
+#define __sPI1_FMA                    	4992
+#define __sPI2_FMA                    	5056
+#define __sPI3_FMA                    	5120
+#define __sA3                         	5184
+#define __sA5                         	5248
+#define __sA7                         	5312
+#define __sA9                         	5376
+#define __sA5_FMA                     	5440
+#define __sA7_FMA                     	5504
+#define __sA9_FMA                     	5568
+#define __sInvPI                      	5632
+#define __sRShifter                   	5696
+#define __sHalfPI                     	5760
+#define __sOneHalf                    	5824
+
+.macro float_vector offset value
+.if .-__svml_ssincos_data != \offset
+.err
+.endif
+.rept 16
+.long \value
+.endr
+.endm
+
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_wrapper_impl.h b/sysdeps/x86_64/fpu/svml_s_wrapper_impl.h
index f88e30f..66bb081 100644
--- a/sysdeps/x86_64/fpu/svml_s_wrapper_impl.h
+++ b/sysdeps/x86_64/fpu/svml_s_wrapper_impl.h
@@ -76,6 +76,67 @@
         ret
 .endm
 
+/* 3 argument SSE2 ISA version as wrapper to scalar.  */
+.macro WRAPPER_IMPL_SSE2_fFF callee
+        pushq   %rbp
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%rbp, 0)
+        pushq   %rbx
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%rbx, 0)
+        movq    %rdi, %rbp
+        movq    %rsi, %rbx
+        subq    $40, %rsp
+        cfi_adjust_cfa_offset(40)
+        leaq    24(%rsp), %rsi
+        leaq    28(%rsp), %rdi
+        movaps  %xmm0, (%rsp)
+        call    \callee@PLT
+        leaq    24(%rsp), %rsi
+        leaq    28(%rsp), %rdi
+        movss   28(%rsp), %xmm0
+        movss   %xmm0, 0(%rbp)
+        movaps  (%rsp), %xmm1
+        movss   24(%rsp), %xmm0
+        movss   %xmm0, (%rbx)
+        movaps  %xmm1, %xmm0
+        shufps  $85, %xmm1, %xmm0
+        call    \callee@PLT
+        movss   28(%rsp), %xmm0
+        leaq    24(%rsp), %rsi
+        movss   %xmm0, 4(%rbp)
+        leaq    28(%rsp), %rdi
+        movaps  (%rsp), %xmm1
+        movss   24(%rsp), %xmm0
+        movss   %xmm0, 4(%rbx)
+        movaps  %xmm1, %xmm0
+        unpckhps        %xmm1, %xmm0
+        call    \callee@PLT
+        movaps  (%rsp), %xmm1
+        leaq    24(%rsp), %rsi
+        leaq    28(%rsp), %rdi
+        movss   28(%rsp), %xmm0
+        shufps  $255, %xmm1, %xmm1
+        movss   %xmm0, 8(%rbp)
+        movss   24(%rsp), %xmm0
+        movss   %xmm0, 8(%rbx)
+        movaps  %xmm1, %xmm0
+        call    \callee@PLT
+        movss   28(%rsp), %xmm0
+        movss   %xmm0, 12(%rbp)
+        movss   24(%rsp), %xmm0
+        movss   %xmm0, 12(%rbx)
+        addq    $40, %rsp
+        cfi_adjust_cfa_offset(-40)
+        popq    %rbx
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%rbx)
+        popq    %rbp
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%rbp)
+        ret
+.endm
+
 /* AVX/AVX2 ISA version as wrapper to SSE ISA version.  */
 .macro WRAPPER_IMPL_AVX callee
         pushq     	%rbp
@@ -130,6 +191,52 @@
         ret
 .endm
 
+/* 3 argument AVX/AVX2 ISA version as wrapper to SSE ISA version.  */
+.macro WRAPPER_IMPL_AVX_fFF callee
+        pushq     %rbp
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%rbp, 0)
+        movq      %rsp, %rbp
+        cfi_def_cfa_register (%rbp)
+        andq      $-32, %rsp
+        pushq     %r13
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%r13, 0)
+        pushq     %r14
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%r14, 0)
+        subq      $48, %rsp
+        movq      %rsi, %r14
+        vmovaps   %ymm0, (%rsp)
+        movq      %rdi, %r13
+        vmovaps   16(%rsp), %xmm1
+        vmovaps   %xmm1, 32(%rsp)
+        vzeroupper
+        vmovaps   (%rsp), %xmm0
+        call      HIDDEN_JUMPTARGET(\callee)
+        vmovaps   32(%rsp), %xmm0
+        lea       (%rsp), %rdi
+        lea       16(%rsp), %rsi
+        call      HIDDEN_JUMPTARGET(\callee)
+        vmovaps   (%rsp), %xmm0
+        vmovaps   16(%rsp), %xmm1
+        vmovaps   %xmm0, 16(%r13)
+        vmovaps   %xmm1, 16(%r14)
+        addq      $48, %rsp
+        popq      %r14
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%r14)
+        popq      %r13
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%r13)
+        movq      %rbp, %rsp
+        cfi_def_cfa_register (%rsp)
+        popq      %rbp
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%rbp)
+        ret
+.endm
+
 /* AVX512 ISA version as wrapper to AVX2 ISA version.  */
 .macro WRAPPER_IMPL_AVX512 callee
         pushq	%rbp
@@ -147,20 +254,9 @@
         .byte	0x29
         .byte	0x04
         .byte	0x24
-/* Below is encoding for vmovaps (%rsp), %ymm0.  */
-        .byte	0xc5
-        .byte	0xfc
-        .byte	0x28
-        .byte	0x04
-        .byte	0x24
+        vmovaps (%rsp), %ymm0
         call	HIDDEN_JUMPTARGET(\callee)
-/* Below is encoding for vmovaps 32(%rsp), %ymm0.  */
-        .byte	0xc5
-        .byte	0xfc
-        .byte	0x28
-        .byte	0x44
-        .byte	0x24
-        .byte	0x20
+        vmovaps 32(%rsp), %ymm0
         call	HIDDEN_JUMPTARGET(\callee)
         movq	%rbp, %rsp
         cfi_def_cfa_register (%rsp)
@@ -195,38 +291,57 @@
         .byte	0x29
         .byte	0x4c
         .byte	0x24
-/* Below is encoding for vmovaps (%rsp), %ymm0.  */
-        .byte	0xc5
-        .byte	0xfc
-        .byte	0x28
+        vmovaps (%rsp), %ymm0
+        vmovaps 64(%rsp), %ymm1
+        call      HIDDEN_JUMPTARGET(\callee)
+        vmovaps 32(%rsp), %ymm0
+        vmovaps 96(%rsp), %ymm1
+        call      HIDDEN_JUMPTARGET(\callee)
+        movq      %rbp, %rsp
+        cfi_def_cfa_register (%rsp)
+        popq      %rbp
+        cfi_adjust_cfa_offset (-8)
+        cfi_restore (%rbp)
+        ret
+.endm
+
+/* 3 argument AVX512 ISA version as wrapper to AVX2 ISA version.  */
+.macro WRAPPER_IMPL_AVX512_fFF callee
+        pushq     %rbp
+        cfi_adjust_cfa_offset (8)
+        cfi_rel_offset (%rbp, 0)
+        movq	%rsp, %rbp
+        cfi_def_cfa_register (%rbp)
+        andq      $-64, %rsp
+        pushq     %r12
+        pushq     %r13
+        subq      $176, %rsp
+        movq      %rsi, %r13
+/* Below is encoding for vmovaps %zmm0, (%rsp).  */
+        .byte	0x62
+        .byte	0xf1
+        .byte	0x7c
+        .byte	0x48
+        .byte	0x29
         .byte	0x04
         .byte	0x24
-/* Below is encoding for vmovaps 64(%rsp), %ymm1.  */
-        .byte	0xc5
-        .byte	0xfc
-        .byte	0x28
-        .byte	0x4c
-        .byte	0x24
-        .byte	0x40
+        movq      %rdi, %r12
+        vmovaps   (%rsp), %ymm0
         call      HIDDEN_JUMPTARGET(\callee)
-/* Below is encoding for vmovaps 32(%rsp), %ymm0.  */
-        .byte	0xc5
-        .byte	0xfc
-        .byte	0x28
-        .byte	0x44
-        .byte	0x24
-        .byte	0x20
-/* Below is encoding for vmovaps 96(%rsp), %ymm1.  */
-        .byte	0xc5
-        .byte	0xfc
-        .byte	0x28
-        .byte	0x4c
-        .byte	0x24
-        .byte	0x60
+        vmovaps   32(%rsp), %ymm0
+        lea       64(%rsp), %rdi
+        lea       96(%rsp), %rsi
         call      HIDDEN_JUMPTARGET(\callee)
+        vmovaps   64(%rsp), %ymm0
+        vmovaps   96(%rsp), %ymm1
+        vmovaps   %ymm0, 32(%r12)
+        vmovaps   %ymm1, 32(%r13)
+        addq      $176, %rsp
+        popq      %r13
+        popq      %r12
         movq      %rbp, %rsp
         cfi_def_cfa_register (%rsp)
-        popq      %rbp
+        popq	%rbp
         cfi_adjust_cfa_offset (-8)
         cfi_restore (%rbp)
         ret
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index 00a1074..6cc6008 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -24,6 +24,7 @@
 
 VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVeN16v_cosf)
 VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVeN16v_sinf)
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVeN16vvv_sincosf)
 VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVeN16v_logf)
 VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVeN16v_expf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVeN16vv_powf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16.c b/sysdeps/x86_64/fpu/test-float-vlen16.c
index 86b8c33..d7f683f 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16.c
@@ -20,6 +20,7 @@
 
 #define TEST_VECTOR_cosf 1
 #define TEST_VECTOR_sinf 1
+#define TEST_VECTOR_sincosf 1
 #define TEST_VECTOR_logf 1
 #define TEST_VECTOR_expf 1
 #define TEST_VECTOR_powf 1
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index 7d41e46..ae12a10 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -24,6 +24,7 @@
 
 VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVbN4v_cosf)
 VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVbN4v_sinf)
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVbN4vvv_sincosf)
 VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVbN4v_logf)
 VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVbN4v_expf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVbN4vv_powf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4.c b/sysdeps/x86_64/fpu/test-float-vlen4.c
index 3e74118..e56d642 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4.c
@@ -20,6 +20,7 @@
 
 #define TEST_VECTOR_cosf 1
 #define TEST_VECTOR_sinf 1
+#define TEST_VECTOR_sincosf 1
 #define TEST_VECTOR_logf 1
 #define TEST_VECTOR_expf 1
 #define TEST_VECTOR_powf 1
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index ed1c893..f0c7d4a 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -27,6 +27,7 @@
 
 VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVdN8v_cosf)
 VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVdN8v_sinf)
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVdN8vvv_sincosf)
 VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVdN8v_logf)
 VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVdN8v_expf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVdN8vv_powf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
index f0aaec1..0012082 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
@@ -23,6 +23,7 @@
 
 #define TEST_VECTOR_cosf 1
 #define TEST_VECTOR_sinf 1
+#define TEST_VECTOR_sincosf 1
 #define TEST_VECTOR_logf 1
 #define TEST_VECTOR_expf 1
 #define TEST_VECTOR_powf 1
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index 37bf702..6b267de 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -24,6 +24,7 @@
 
 VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVcN8v_cosf)
 VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVcN8v_sinf)
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVcN8vvv_sincosf)
 VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVcN8v_logf)
 VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVcN8v_expf)
 VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVcN8vv_powf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8.c b/sysdeps/x86_64/fpu/test-float-vlen8.c
index ef2aedc..581cbde 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8.c
@@ -20,6 +20,7 @@
 
 #define TEST_VECTOR_cosf 1
 #define TEST_VECTOR_sinf 1
+#define TEST_VECTOR_sincosf 1
 #define TEST_VECTOR_logf 1
 #define TEST_VECTOR_expf 1
 #define TEST_VECTOR_powf 1

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                                          |   33 +
 NEWS                                               |    4 +-
 math/test-float-vlen16.h                           |   17 +
 math/test-float-vlen4.h                            |   17 +
 math/test-float-vlen8.h                            |   17 +
 sysdeps/unix/sysv/linux/x86_64/libmvec.abilist     |    4 +
 sysdeps/x86/fpu/bits/math-vector.h                 |    2 +
 sysdeps/x86_64/fpu/Makefile                        |    4 +-
 sysdeps/x86_64/fpu/Versions                        |    1 +
 sysdeps/x86_64/fpu/libm-test-ulps                  |    8 +
 sysdeps/x86_64/fpu/multiarch/Makefile              |    3 +-
 .../x86_64/fpu/multiarch/svml_s_sincosf16_core.S   |   39 +
 .../fpu/multiarch/svml_s_sincosf16_core_avx512.S   |  504 +++++++++
 .../x86_64/fpu/multiarch/svml_s_sincosf4_core.S    |   38 +
 .../fpu/multiarch/svml_s_sincosf4_core_sse4.S      |  268 +++++
 .../x86_64/fpu/multiarch/svml_s_sincosf8_core.S    |   38 +
 .../fpu/multiarch/svml_s_sincosf8_core_avx2.S      |  241 +++++
 sysdeps/x86_64/fpu/svml_s_sincosf16_core.S         |   25 +
 sysdeps/x86_64/fpu/svml_s_sincosf4_core.S          |   30 +
 sysdeps/x86_64/fpu/svml_s_sincosf8_core.S          |   29 +
 sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S      |   25 +
 sysdeps/x86_64/fpu/svml_s_sincosf_data.S           | 1140 ++++++++++++++++++++
 sysdeps/x86_64/fpu/svml_s_sincosf_data.h           |   61 ++
 sysdeps/x86_64/fpu/svml_s_wrapper_impl.h           |  193 +++-
 sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c    |    1 +
 sysdeps/x86_64/fpu/test-float-vlen16.c             |    1 +
 sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c     |    1 +
 sysdeps/x86_64/fpu/test-float-vlen4.c              |    1 +
 .../x86_64/fpu/test-float-vlen8-avx2-wrappers.c    |    1 +
 sysdeps/x86_64/fpu/test-float-vlen8-avx2.c         |    1 +
 sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c     |    1 +
 sysdeps/x86_64/fpu/test-float-vlen8.c              |    1 +
 32 files changed, 2706 insertions(+), 43 deletions(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_sincosf_data.S
 create mode 100644 sysdeps/x86_64/fpu/svml_s_sincosf_data.h


hooks/post-receive
-- 
GNU C Library master sources


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]