This is the mail archive of the
binutils@sourceware.org
mailing list for the binutils project.
Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512
- From: Kirill Yukhin <kirill dot yukhin at gmail dot com>
- To: Richard Henderson <rth at twiddle dot net>
- Cc: Richard Biener <richard dot guenther at gmail dot com>, "H.J. Lu" <hjl dot tools at gmail dot com>, GNU C Library <libc-alpha at sourceware dot org>, GCC Development <gcc at gcc dot gnu dot org>, Binutils <binutils at sourceware dot org>, "Girkar, Milind" <milind dot girkar at intel dot com>, "Kreitzer, David L" <david dot l dot kreitzer at intel dot com>
- Date: Tue, 30 Jul 2013 17:55:08 +0400
- Subject: Re: [x86-64 psABI]: Extend x86-64 psABI to support AVX-512
- References: <CAMe9rOrvMxSLj3LcYBs71tVdw6C0vJFKD2HxvnoHc13UamftwA at mail dot gmail dot com> <ddab98c2-bb3b-4d02-b403-e7d5690cfe00 at email dot android dot com> <51F01C0A dot 5050101 at twiddle dot net>
On Wed, Jul 24, 2013 at 08:25:14AM -1000, Richard Henderson wrote:
> On 07/24/2013 05:23 AM, Richard Biener wrote:
> > "H.J. Lu" <hjl.tools@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> Here is a patch to extend x86-64 psABI to support AVX-512:
> >
> > Afaik avx 512 doubles the amount of xmm registers. Can we get them callee saved please?
>
> Having them callee saved pre-supposes that one knows the width of the register.
Whole architecture of SSE/AVX is based on the fact of zerroing-upper.
For references - take a look at definition of VLMAX in Spec.
E.g. for AVX2 we had:
vaddps %ymm1, %ymm2, %ymm3
Intuition says (at least to me) that after compilation it shouldn't have an idea of 256-bit `upper' half.
But with AVX-512 we have (again, see Spec, operation section of vaddps, VEX.256 encoded):
DEST[31:0] = SRC1[31:0] + SRC2[31:0]
...
DEST[255:224] = SRC1[255:224] + SRC2[255:224].
DEST[MAX_VL-1:256] = 0
So, legacy code *will* change upper 256-bit of vector register.
The roots can be found in GPR 64-bit insns. So, we have different behavior on 64-bit and 32-bit target for following sequence:
push %eax
;; play with eax
pop %eax
on 64-bit machine upper 32-bits of %eax will be zeroed, and if we'll try to use old version of %rax - fail!
So, following such philosophy prohibits to make vector registers callee-safe.
BUT.
What if we make couple of new registers calle-safe in the sense of *scalar* type?
So, what we can do:
1. make callee-safe only bits [0..XXX] of vector register.
2. make call-clobbered bits of (XXX..VLMAX] in the same register.
XXX is number of bits to be callee-safe: 64, 80, 128 or even 512.
Advantage is that when we are doing FP scalar code, we donât bother about save/restore callee-safe part.
vaddss %xmm17, %xmm17, %xmm17
call foo
vaddss %xmm17, %xmm17, %xmm17
We donât care if `fooâ:
- is legacy in AVX-512 sense â it just see no xmm17
- in future ISA sense. If this code is 1024-bit wide reg and `fooâ is AVX-512. It will save XXX bits, allowing us to continue scalar calculations without saving/restore
--
Thanks, K