This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH RFC] Imporve 64bit memset performance for Haswell CPU with AVX2 instruction

From: Ling Ma <ling dot ma dot program at gmail dot com>
To: OndÅej BÃlka <neleai at seznam dot cz>
Cc: libc-alpha at sourceware dot org, rth at twiddle dot net, aj at suse dot com, liubov dot dmitrieva at gmail dot com, hjl dot tools at gmail dot com, Ling Ma <ling dot ml at alibaba-inc dot com>
Date: Fri, 30 May 2014 17:02:29 +0800
Subject: Re: [PATCH RFC] Imporve 64bit memset performance for Haswell CPU with AVX2 instruction
Authentication-results: sourceware.org; auth=none
References: <1396850238-29041-1-git-send-email-ling dot ma at alipay dot com> <20140513173616 dot GC5047 at domone dot podge> <20140515201458 dot GA24885 at domone dot podge>

Hi all,

Here is latest memset pach: http://www.yunos.org/tmp/memset-avx2.patch

When I send patch by git-send-email, libc-alpha@sourceware.org refuse
to show it,
Sorry for  Inconvenience to you

Thanks
Ling


2014-05-16 4:14 GMT+08:00, OndÅej BÃlka <neleai@seznam.cz>:
> Correction, in for following
>
> On Tue, May 13, 2014 at 07:36:16PM +0200, OndÅej BÃlka wrote:
>> > +	ALIGN(4)
>> > +L(gobble_data):
>> > +#ifdef SHARED_CACHE_SIZE_HALF
>> > +	mov	$SHARED_CACHE_SIZE_HALF, %r9
>> > +#else
>> > +	mov	__x86_shared_cache_size_half(%rip), %r9
>> > +#endif
>> > +	shl	$4, %r9
>> > +	cmp	%r9, %rdx
>> > +	ja	L(gobble_big_data)
>> > +	mov	%rax, %r9
>> > +	mov	%esi, %eax
>> > +	mov	%rdx, %rcx
>> > +	rep	stosb
>> > +	mov	%r9, %rax
>> > +	vzeroupper
>> > +	ret
>> > +
>> > +	ALIGN(4)
>> > +L(gobble_big_data):
>> > +	sub	$0x80, %rdx
>> > +L(gobble_big_data_loop):
>> > +	vmovntdq	%ymm0, (%rdi)
>> > +	vmovntdq	%ymm0, 0x20(%rdi)
>> > +	vmovntdq	%ymm0, 0x40(%rdi)
>> > +	vmovntdq	%ymm0, 0x60(%rdi)
>> > +	lea	0x80(%rdi), %rdi
>> > +	sub	$0x80, %rdx
>> > +	jae	L(gobble_big_data_loop)
>> > +	vmovups	%ymm0, -0x80(%r8)
>> > +	vmovups	%ymm0, -0x60(%r8)
>> > +	vmovups	%ymm0, -0x40(%r8)
>> > +	vmovups	%ymm0, -0x20(%r8)
>> > +	vzeroupper
>> > +	sfence
>> > +	ret
>>
>> That loop does seem to help on haswell at all, It is indistingushible
>> from
>> rep stosb loop above. I used following benchmark to check that with
>> different sizes but performance stayed same.
>>
>> #include <stdlib.h>
>> #include <string.h>
>> int main(){
>>  int i;
>>  char *x=malloc(100000000);
>>   for (i=0;i<100;i++)
>>    MEMSET(x,0,100000000);
>>
>> }
>>
>>
>> for I in `seq 1 10`; do
>> echo avx
>> gcc -L. -DMEMSET=__memset_avx2 -lc_profile big.c
>> time LD_LIBRARY_PATH=. ./a.out
>> echo rep
>> gcc -L. -DMEMSET=__memset_rep -lc_profile big.c
>> time LD_LIBRARY_PATH=. ./a.out
>> done
>
> Sorry I forgotten that __memset_rep also has branch for large inputs so
> what I wrote was wrong.
>
> I retested it with fixed rep stosq and your loop is around 20% slower on
> similar test so its better to remove that loop.
>
> $ gcc big.c -o big
> $ time LD_PRELOAD=./memset-avx2.so ./big
>
> real    0m0.076s
> user    0m0.066s
> sys     0m0.010s
>
> $ time LD_PRELOAD=./memset_rep.so ./big
>
> real    0m0.063s
> user    0m0.042s
> sys     0m0.021s
>
> I use a different benchmark to be sure, it could be download here and
> run it commands above in that directory.
>
> http://kam.mff.cuni.cz/~ondra/memset_consistency_benchmark.tar.bz2
>
> For different implementation you need to create .so with function
> memset, there is script compile that compiles all .s files provided that
> first line is of shape
>
> # arch_requirement function_name color
>
>

Follow-Ups:
- Re: [PATCH RFC] Imporve 64bit memset performance for Haswell CPU with AVX2 instruction
  - From: OndÅej BÃlka

References:
- Re: [PATCH RFC] Imporve 64bit memset performance for Haswell CPU with AVX2 instruction
  - From: OndÅej BÃlka
- Re: [PATCH RFC] Imporve 64bit memset performance for Haswell CPU with AVX2 instruction
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]