This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH RFC] Imporve 64bit memcpy performance for Haswell CPU with AVX instruction

From: OndÅej BÃlka <neleai at seznam dot cz>
To: "H.J. Lu" <hjl dot tools at gmail dot com>
Cc: Ling Ma <ling dot ma dot program at gmail dot com>, GNU C Library <libc-alpha at sourceware dot org>, Liubov Dmitrieva <liubov dot dmitrieva at gmail dot com>, yumkam at gmail dot com, Ling Ma <ling dot ml at alibaba-inc dot com>
Date: Wed, 25 Jun 2014 18:34:16 +0200
Subject: Re: [PATCH RFC] Imporve 64bit memcpy performance for Haswell CPU with AVX instruction
Authentication-results: sourceware.org; auth=none
References: <CAOGi=dOQEbbkkzQGz-ZtQ0-WEHj2=hjmbstZXvZyLqycVy18Kg at mail dot gmail dot com> <20140515202213 dot GA20667 at domone dot podge> <CAOGi=dNbyxj+7gjwcpAVBxYB-MH9E7s=xi2nKwYXkDViasOZrA at mail dot gmail dot com> <CAMe9rOpC5-p7DV=xBfhUknkruz2-Ek+Bpzm+ycZiKdXtSyXxiA at mail dot gmail dot com> <CAOGi=dNHHvNriOMWmj2K3Ym7n6G83mGOyUzMtNY91nFr8=7G9w at mail dot gmail dot com> <CAOGi=dOJX3saKoa5YiDdveOqAb_=Sev4cBKyh7_gkXBU8_4=+g at mail dot gmail dot com> <CAMe9rOpEhNffr5iZUZLFp4QyBAE-Xrxna8-BQFv=tZXEXdSLSg at mail dot gmail dot com> <CAOGi=dNk7H2+aWh=+3_qwVH9LvWN-eNKcLciW=0J7x1dVL9v+g at mail dot gmail dot com> <CAOGi=dMsSdQi8SuXi2pzCbMm6bCrwJru0rAjtg=cn24CLgOgRg at mail dot gmail dot com> <CAMe9rOqZpj4BE7kXABOAueaD-o1PgRjL_R48KeDcJBDSmHXPdg at mail dot gmail dot com>

On Wed, Jun 25, 2014 at 08:16:58AM -0700, H.J. Lu wrote:
> On Wed, Jun 25, 2014 at 7:45 AM, Ling Ma <ling.ma.program@gmail.com> wrote:
> > By modifying test suite, we re-test 403.gcc in two parts: one is below
> > 256bytes,
> > the other is over 256bytes, The results as gzipped attachment shows
> > (compared with pending sse2 memcpy):
> > 1. when copy size is below 256 bytes, avx memcpy get almost the same
> > performance because its instructions also use 16bytes registers.
> >
> > 2. when copy size is over 256bytes avx memcpy improve performance from
> > 4.9% to 33% because its instructions use 32bytes registers.
> >
> > So avx memcpy avoid regression for small size and improve performance
> > for big size.
> >
> > Thanks
> > Ling
> >
> 
> I'd like to get it in.  Any more feedbacks?
> 
Now only generic one that it needs to fix formatting like memset.

Also what is a point of this code? A forward/backwared decision was
already done.

+#ifdef USE_AS_MEMMOVE
+       mov     %rsi, %r10
+       sub     %rdi, %r10
+       cmp     %rdx, %r10
+       jae     L(memmove_use_memcpy_fwd)
+       cmp     %rcx, %r10
+       jae     L(memmove_use_memcpy_fwd)
+       jmp L(gobble_mem_fwd_llc_start)
+L(memmove_use_memcpy_fwd):
+#endif
+       cmp     %rcx, %rdx
+       jae     L(gobble_big_data_fwd)
+#ifdef USE_AS_MEMMOVE
+L(gobble_mem_fwd_llc_start):
+#endif

I will comment performance tests later.

References:
- Re: [PATCH RFC] Imporve 64bit memcpy performance for Haswell CPU with AVX instruction
  - From: H.J. Lu
- Re: [PATCH RFC] Imporve 64bit memcpy performance for Haswell CPU with AVX instruction
  - From: Ling Ma
- Re: [PATCH RFC] Imporve 64bit memcpy performance for Haswell CPU with AVX instruction
  - From: Ling Ma
- Re: [PATCH RFC] Imporve 64bit memcpy performance for Haswell CPU with AVX instruction
  - From: H.J. Lu
- Re: [PATCH RFC] Imporve 64bit memcpy performance for Haswell CPU with AVX instruction
  - From: Ling Ma
- Re: [PATCH RFC] Imporve 64bit memcpy performance for Haswell CPU with AVX instruction
  - From: H.J. Lu

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]