[PATCH v2 1/2] aarch64: Add half-width versions of AdvSIMD f32 libmvec routines
Szabolcs Nagy
szabolcs.nagy@arm.com
Tue Dec 19 16:17:42 GMT 2023
The 12/18/2023 15:51, Joe Ramsay wrote:
> --- a/sysdeps/aarch64/fpu/v_math.h
> +++ b/sysdeps/aarch64/fpu/v_math.h
> @@ -29,6 +29,21 @@
> #define V_NAME_F2(fun) _ZGVnN4vv_##fun##f
> #define V_NAME_D2(fun) _ZGVnN2vv_##fun
>
> +#include "advsimd_f32_protos.h"
> +
> +#define HALF_WIDTH_ALIAS_F1(fun) \
> + float32x2_t VPCS_ATTR _ZGVnN2v_##fun##f (float32x2_t x) \
> + { \
> + return vget_low_f32 (_ZGVnN4v_##fun##f (vcombine_f32 (x, x))); \
> + }
> +
> +#define HALF_WIDTH_ALIAS_F2(fun) \
> + float32x2_t VPCS_ATTR _ZGVnN2vv_##fun##f (float32x2_t x, float32x2_t y) \
> + { \
> + return vget_low_f32 ( \
> + _ZGVnN4vv_##fun##f (vcombine_f32 (x, x), vcombine_f32 (y, y))); \
> + }
> +
gcc sometimes inlines the _ZGVnN4v* call, so we should add
noinline to those to avoid code size explosion.
gcc also fails to tail call which should be fixed in gcc.
More information about the Libc-alpha
mailing list