This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH RFC 2/2 V3] Improve 64bit memset for Corei7 with avx2 instruction


On Mon, Jul 29, 2013 at 05:42:02AM -0400, ling.ma.program@gmail.com wrote:
> From: Ma Ling <ling.ml@alibaba-inc.com>
> +ENTRY (MEMSET)
> +	vpxor	%xmm0, %xmm0, %xmm0
> +	vmovd %esi, %xmm1
> +	lea	(%rdi, %rdx), %r8
> +	vpshufb	%xmm0, %xmm1, %xmm0
> +	mov	%rdi, %rax
> +	cmp	$256, %rdx
> +	jae	L(256bytesormore)
> +	xor	%ecx, %ecx
> +	mov %sil, %cl
> +	mov %cl, %ch
What should be this? You do not need that data and it could slow memset
down for 64-128 byte range.
...
> +	cmp	$128, %rdx
> +	jb	L(less_128bytes)
...
> +L(less_128bytes):
> +	xor	%esi, %esi
> +	mov	%ecx, %esi
And this? A C equivalent of this is 
x = 0;
x = y;
which is clearly redundant.

Having elementary errors like this does not inspire lot of confidence. 
> +	shl	$16, %ecx
> +	cmp	$64, %edx
> +	jb	L(less_64bytes)
> +L(less_64bytes):
> +	orl	%esi, %ecx
> +	mov	%ecx, %esi
> +	cmp	$32, %edx
> +	jb	L(less_32bytes)
...
> +L(less_32bytes):
> +	shl	$32, %rcx
> +	cmp	$16, %edx
> +	jb	L(less_16bytes)
> +L(less_16bytes):
> +	or	%rsi, %rcx
> +	cmp	$8, %edx
> +	jb	L(less_8bytes)
> +	mov %rcx, (%rdi)
> +	mov %rcx, -0x08(%r8)
> +	ret
> +	ALIGN(4)
...
> +L(gobble_data):
> +#ifdef SHARED_CACHE_SIZE_HALF
> +	mov	$SHARED_CACHE_SIZE_HALF, %r9
> +#else
> +	mov	__x86_shared_cache_size_half(%rip), %r9
> +#endif
> +	shl	$4, %r9
> +	cmp	%r9, %rdx
> +	ja	L(gobble_big_data)
> +	mov	%rax, %r9
> +	mov	%esi, %eax
> +	mov	%rdx, %rcx
> +	rep	stosb
> +	mov	%r9, %rax
> +	vzeroupper
> +	ret
> +
Redundant vzeroupper.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]