This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3 13/28] arm64/sve: Signal handling support


On Fri, Oct 13, 2017 at 12:17:19PM +0100, Catalin Marinas wrote:
> On Thu, Oct 12, 2017 at 05:11:57PM +0100, Dave P Martin wrote:
> > On Wed, Oct 11, 2017 at 05:40:52PM +0100, Catalin Marinas wrote:
> > > On Tue, Oct 10, 2017 at 07:38:30PM +0100, Dave P Martin wrote:
> > > > diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> > > > index aabeaee..fa4ed34 100644
> > > > --- a/arch/arm64/kernel/fpsimd.c
> > > > +++ b/arch/arm64/kernel/fpsimd.c
> > > > @@ -310,6 +310,32 @@ static void fpsimd_to_sve(struct task_struct *task)
> > > >  		       sizeof(fst->vregs[i]));
> > > >  }
> > > >  
> > > > +/*
> > > > + * Transfer the SVE state in task->thread.sve_state to
> > > > + * task->thread.fpsimd_state.
> > > > + *
> > > > + * Task can be a non-runnable task, or current.  In the latter case,
> > > > + * softirqs (and preemption) must be disabled.
> > > > + * task->thread.sve_state must point to at least sve_state_size(task)
> > > > + * bytes of allocated kernel memory.
> > > > + * task->thread.sve_state must be up to date before calling this function.
> > > > + */
> > > > +static void sve_to_fpsimd(struct task_struct *task)
> > > > +{
> > > > +	unsigned int vq;
> > > > +	void const *sst = task->thread.sve_state;
> > > > +	struct fpsimd_state *fst = &task->thread.fpsimd_state;
> > > > +	unsigned int i;
> > > > +
> > > > +	if (!system_supports_sve())
> > > > +		return;
> > > > +
> > > > +	vq = sve_vq_from_vl(task->thread.sve_vl);
> > > > +	for (i = 0; i < 32; ++i)
> > > > +		memcpy(&fst->vregs[i], ZREG(sst, vq, i),
> > > > +		       sizeof(fst->vregs[i]));
> > > > +}
> > > 
> > > Nit: could we actually just do an assignment with some pointer casting?
> > > It looks like we invoke memcpy for every 16 bytes (same for
> > > fpsimd_to_sve).
> > 
> > I was uneasy about what the type of ZREG(sst, vq, i) ought to be.
> > In any case, memest() is magic: my oldskool GCC (5.3.0) generates:
> > 
> > ffff000008084c70 <sve_to_fpsimd>:
> > ffff000008084c70:       14000004        b       ffff000008084c80 <sve_to_fpsimd+0x10>
> > ffff000008084c74:       d503201f        nop
> > ffff000008084c78:       d65f03c0        ret
> > ffff000008084c7c:       d503201f        nop
> > ffff000008084c80:       f0007d61        adrp    x1, ffff000009033000 <reset_devices>
> > ffff000008084c84:       f942a021        ldr     x1, [x1,#1344]
> > ffff000008084c88:       36b001c1        tbz     w1, #22, ffff000008084cc0 <sve_to_fpsimd+0x50>
> > ffff000008084c8c:       b94ca805        ldr     w5, [x0,#3240]
> > ffff000008084c90:       912a0001        add     x1, x0, #0xa80
> > ffff000008084c94:       91320004        add     x4, x0, #0xc80
> > ffff000008084c98:       f9465006        ldr     x6, [x0,#3232]
> > ffff000008084c9c:       121c6ca5        and     w5, w5, #0xfffffff0
> > ffff000008084ca0:       52800000        mov     w0, #0x0                        // #0
> > ffff000008084ca4:       8b2040c2        add     x2, x6, w0, uxtw
> > ffff000008084ca8:       0b050000        add     w0, w0, w5
> > ffff000008084cac:       a9400c42        ldp     x2, x3, [x2]
> > ffff000008084cb0:       a8810c22        stp     x2, x3, [x1],#16
> > ffff000008084cb4:       eb01009f        cmp     x4, x1
> > ffff000008084cb8:       54ffff61        b.ne    ffff000008084ca4 <sve_to_fpsimd+0x34>
> > ffff000008084cbc:       d65f03c0        ret
> > ffff000008084cc0:       d65f03c0        ret
> > ffff000008084cc4:       d503201f        nop
> > 
> > 
> > Without volatile, I think assigning a single object and doing a memcpy()
> > are equivalent to the compiler: which it actually uses depends solely on
> > optimisation considerations.
> > 
> > (But then I'm not a language lawyer ... not a professional one anyway).
> > 
> > Are you concerned compilers may mess this up?
> 
> That's fine, please ignore my comment then. I was worried that gcc would
> always generate a call to the memcpy implementation rather than inlining
> it.

OK.  I'll keep an eye on this, but in any case it won't impact the
FPSIMD-only case.

Cheers
---Dave


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]