This is the mail archive of the
libc-ports@sources.redhat.com
mailing list for the libc-ports project.
Re: [Patches] [PATCH] ARM: NEON detected memcpy.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Will Newton <will dot newton at linaro dot org>
- Cc: "Shih-Yuan Lee (FourDollars)" <sylee at canonical dot com>, "Joseph S. Myers" <joseph at codesourcery dot com>, libc-ports at sourceware dot org, Jesse Sung <jesse dot sung at canonical dot com>, patches at eglibc dot org, YC Cheng <yc dot cheng at canonical dot com>, rex dot tsai at canonical dot com
- Date: Mon, 8 Apr 2013 12:27:14 +0200
- Subject: Re: [Patches] [PATCH] ARM: NEON detected memcpy.
- References: <CAAT15mNnqeb6tuVdV6b4uJf-qFDH1acxevyW6f-gH+SkguENmg at mail dot gmail dot com> <Pine dot LNX dot 4 dot 64 dot 1304031505020 dot 580 at digraph dot polyomino dot org dot uk> <CAAT15mMJSHiO5rZ6EAbss79f_t4Qiaryi-qjmw3TwGg4vrg2=A at mail dot gmail dot com> <20130403161949 dot GA6759 at domone dot kolej dot mff dot cuni dot cz> <CAAT15mMZgtfcUr3rgz3BiY-v14-DW9u1LHP+5jp2rD3uxA+=sw at mail dot gmail dot com> <20130404063701 dot GA6324 at domone dot kolej dot mff dot cuni dot cz> <CANu=DmjvA9F_opxpwvNb-BMA63obkrr8RVW+yG3PDuhQm9zjHQ at mail dot gmail dot com>
On Mon, Apr 08, 2013 at 10:11:59AM +0100, Will Newton wrote:
> On 4 April 2013 07:37, OndÅej BÃlka <neleai@seznam.cz> wrote:
> > On Thu, Apr 04, 2013 at 12:15:17PM +0800, Shih-Yuan Lee (FourDollars) wrote:
> >> Hi Ondrej,
> >>
> >> I do have some benchmark data.
> >>
> > Hi,
> >
> > Try also benchmark with real world data (20MB). I put it on
> > http://kam.mff.cuni.cz/~ondra/dryrun_memcpy.tar.bz2
>
> Hi Ondrej,
>
> How was the workload chosen for this test run? Is it a known "memcpy
> hot" workload?
>
Collected during day of normal usage.
Majority of memcpy calls are hot, see how delay between calls are distributes in:
http://kam.mff.cuni.cz/~ondra/benchmark_string/profile/result.html
There more than 95% of calls is less than 2^15 = 32768 cycles from previous
call.
> Also it looks like the data was captured on x86_64? I suspect we
yes.
> should use a specific data set for each architecture - the alignment
> of data will change depending on the ABI alignment rules and different
> compilers inline e.g. constant sized memcpys in different ways. Last
> time I looked gcc seemed to be much more aggressive with inlining
> string functions on x86 than arm for example.
>
If you want capture data for arm do following:
rm record.rec # Otherwise you would append to x64 data.
make
# I did not test on arm so record for example make or anything other of interest.
LD_PRELOAD=./record.so make
# Then see if data are really recorded
./show #displays alignment and lengths of recorded data.
# Finally you can enably recording globaly by
echo $PWD/record.so >> /etc/ld.so.preload
> Thanks,
>
> --
> Will Newton
> Toolchain Working Group, Linaro