This is the mail archive of the libc-ports@sources.redhat.com mailing list for the libc-ports project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 24/26] arm: Add optimized addmul_1


Richard Henderson <rth@twiddle.net> writes:

> +ENTRY(__mpn_addmul_1)
> +	push	{ r4, r5, r6 }
> +	cfi_adjust_cfa_offset (12)
> +	cfi_rel_offset (r4, 0)
> +	cfi_rel_offset (r5, 4)
> +	cfi_rel_offset (r6, 8)
> +
> +	ldr	r6, [r1], #4
> +	ldr	r5, [r0]
> +	mov	r4, #0			/* init carry in */
> +	b	1f
> +0:
> +	ldr	r6, [r1], #4		/* load next ul */
> +	adds	r4, r4, r5		/* (out, c) = cl + lpl */
> +	ldr	r5, [r0, #4]		/* load next rl */
> +	str	r4, [r0], #4
> +	adc	r4, ip, #0		/* cl = hpl + c */

You might gain a cycle here on some cores by replacing r4 by something
else in the adds/str sequence and reversing the order of the last two
insns to better exploit dual-issue.  On most semi-modern cores you can
get another register for free by pushing one more to the stack
(load/store multiple instructions transfer registers pairwise).

I'd expect this to benefit the A8 and maybe A9.  On A15 it should make
no difference.

> +1:
> +	mov	ip, #0			/* zero-extend rl */
> +	umlal	r5, ip, r6, r3		/* (hpl, lpl) = ul * vl + rl */
> +	subs	r2, r2, #1
> +	bne	0b
> +
> +	adds	r4, r4, r5		/* (out, c) = cl + llpl */
> +	str	r4, [r0]
> +	adc	r0, ip, #0		/* return hpl + c */
> +
> +	pop	{ r4, r5, r6 }
> +	DO_RET(lr)
> +END(__mpn_addmul_1)

-- 
Måns Rullgård
mans@mansr.com


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]