[PATCH v2 1/2] aarch64: Add half-width versions of AdvSIMD f32 libmvec routines

Tue Dec 19 16:17:42 GMT 2023

The 12/18/2023 15:51, Joe Ramsay wrote:
> --- a/sysdeps/aarch64/fpu/v_math.h
> +++ b/sysdeps/aarch64/fpu/v_math.h
> @@ -29,6 +29,21 @@
>  #define V_NAME_F2(fun) _ZGVnN4vv_##fun##f
>  #define V_NAME_D2(fun) _ZGVnN2vv_##fun
>  
> +#include "advsimd_f32_protos.h"
> +
> +#define HALF_WIDTH_ALIAS_F1(fun)                                              \
> +  float32x2_t VPCS_ATTR _ZGVnN2v_##fun##f (float32x2_t x)                     \
> +  {                                                                           \
> +    return vget_low_f32 (_ZGVnN4v_##fun##f (vcombine_f32 (x, x)));            \
> +  }
> +
> +#define HALF_WIDTH_ALIAS_F2(fun)                                              \
> +  float32x2_t VPCS_ATTR _ZGVnN2vv_##fun##f (float32x2_t x, float32x2_t y)     \
> +  {                                                                           \
> +    return vget_low_f32 (                                                     \
> +	_ZGVnN4vv_##fun##f (vcombine_f32 (x, x), vcombine_f32 (y, y)));       \
> +  }
> +

gcc sometimes inlines the _ZGVnN4v* call, so we should add
noinline to those to avoid code size explosion.

gcc also fails to tail call which should be fixed in gcc.