This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Intel's new rte_memcpy()
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Luke Gorrie <luke at snabb dot co>
- Cc: "H.J. Lu" <hjl dot tools at gmail dot com>, éå(åå) <ling dot ml at alibaba-inc dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Mon, 2 Feb 2015 14:27:58 +0100
- Subject: Re: Intel's new rte_memcpy()
- Authentication-results: sourceware.org; auth=none
- References: <CAA2XHbendDcfydewf2nrpPQkSsDWPdEH0SMsnqZAFsLF9q4Fzg at mail dot gmail dot com> <CAMe9rOpELuXQLvHQLLAeZitTTcz-xeg=ROoDm0dHe-fg4m-Jew at mail dot gmail dot com> <20150131184837 dot GA3539 at domone> <CAA2XHbcvDWjrshkdyo3++PnViOs5MdOC_+8qZoEb5UXJ21F1zA at mail dot gmail dot com>
On Mon, Feb 02, 2015 at 10:00:13AM +0100, Luke Gorrie wrote:
> On 31 January 2015 at 19:48, OndÅej BÃlka <neleai@seznam.cz> wrote:
> >
> > On Fri, Jan 30, 2015 at 09:03:50AM -0800, H.J. Lu wrote:
> > > On Fri, Jan 30, 2015 at 5:52 AM, Luke Gorrie <luke@snabb.co> wrote:
> > > > Should networking application developers adopt Intel's custom
> > > > implementation if (like me) they are absolutely dependent on good and
> > > > consistent performance of memcpy on all recent hardware (>= Sandy
> > > > Bridge) and Linux distributions? (and then -- what to do about
> > > > memmove?)
> > >
> > Definitely not.
>
>
> Thank you for the detailed feedback!
>
> Questions... :-)
>
> Is there a simple way that I can reproduce these benchmarks? (I am
> curious in general and I would also like to run this on the two-socket
> Xeon E5 machines that I test with.)
>
Download tarball I mentioned before
http://kam.mff.cuni.cz/~ondra/benchmark_string/memcpy_profile310115.tar.bz2
Then compile it with
make
That also prints a LD_PRELOAD=... that is used for profiling.
For profiling itself you do sequence
make reset # to clean previous profiling results
LD_PRELOAD=... bash
# now execute command that you want to profile
make rep
that creates result directory with graphs that I shown.
There is shortcut ./benchmark that runs benchmarks I shown earlier to
and moves them to result* directories.
> I would like to create relatively portable binaries that don't depend
> on recent glibc releases. For this purpose I am tempted to reference
> an older memcpy in my symbol table with this trick:
>
> __asm__(".symver memcpy,memcpy@GLIBC_2.2.5");
>
> Is that a reasonable idea? Is it likely to have a significant
> performance cost on some platforms? (in practice will this make memcpy
> act like memmove?)
>
Depends on workload, if you do lot of large copies then avx2 improvement
is significant. I cannot predict effect in general so test it.