This is the mail archive of the
libc-ports@sources.redhat.com
mailing list for the libc-ports project.
Re: [Patches] [PATCH] ARM: NEON detected memcpy.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: "Shih-Yuan Lee (FourDollars)" <sylee at canonical dot com>
- Cc: "Joseph S. Myers" <joseph at codesourcery dot com>, libc-ports at sourceware dot org, Jesse Sung <jesse dot sung at canonical dot com>, patches at eglibc dot org, YC Cheng <yc dot cheng at canonical dot com>, rex dot tsai at canonical dot com
- Date: Thu, 4 Apr 2013 08:37:01 +0200
- Subject: Re: [Patches] [PATCH] ARM: NEON detected memcpy.
- References: <CAAT15mNnqeb6tuVdV6b4uJf-qFDH1acxevyW6f-gH+SkguENmg at mail dot gmail dot com> <Pine dot LNX dot 4 dot 64 dot 1304031505020 dot 580 at digraph dot polyomino dot org dot uk> <CAAT15mMJSHiO5rZ6EAbss79f_t4Qiaryi-qjmw3TwGg4vrg2=A at mail dot gmail dot com> <20130403161949 dot GA6759 at domone dot kolej dot mff dot cuni dot cz> <CAAT15mMZgtfcUr3rgz3BiY-v14-DW9u1LHP+5jp2rD3uxA+=sw at mail dot gmail dot com>
On Thu, Apr 04, 2013 at 12:15:17PM +0800, Shih-Yuan Lee (FourDollars) wrote:
> Hi Ondrej,
>
> I do have some benchmark data.
>
Hi,
Try also benchmark with real world data (20MB). I put it on
http://kam.mff.cuni.cz/~ondra/dryrun_memcpy.tar.bz2
To add neon copy test_generic.c file and add compiling neon
implementation to benchmark script.
It now only measures total time.
I would need something like timestamp counter for more detailed results.
> --- Running benchmarks (average case/perfect alignment case) ---
>
> very small data test:
> memcpy_arm : (3 bytes copy) = 86.2 MB/s / 88.3 MB/s
> memcpy_neon : (3 bytes copy) = 53.4 MB/s / 54.5 MB/s
> memcpy_arm : (4 bytes copy) = 79.8 MB/s / 62.9 MB/s
> memcpy_neon : (4 bytes copy) = 72.5 MB/s / 73.9 MB/s
> memcpy_arm : (5 bytes copy) = 91.0 MB/s / 78.7 MB/s
> memcpy_neon : (5 bytes copy) = 90.2 MB/s / 91.0 MB/s
> memcpy_arm : (7 bytes copy) = 109.5 MB/s / 104.7 MB/s
> memcpy_neon : (7 bytes copy) = 122.1 MB/s / 126.6 MB/s
> memcpy_arm : (8 bytes copy) = 122.4 MB/s / 122.4 MB/s
> memcpy_neon : (8 bytes copy) = 142.0 MB/s / 148.2 MB/s
> memcpy_arm : (11 bytes copy) = 157.8 MB/s / 161.3 MB/s
> memcpy_neon : (11 bytes copy) = 193.8 MB/s / 196.2 MB/s
> memcpy_arm : (12 bytes copy) = 170.1 MB/s / 172.7 MB/s
> memcpy_neon : (12 bytes copy) = 206.8 MB/s / 212.5 MB/s
> memcpy_arm : (15 bytes copy) = 204.0 MB/s / 209.6 MB/s
> memcpy_neon : (15 bytes copy) = 247.5 MB/s / 270.3 MB/s
> memcpy_arm : (16 bytes copy) = 212.2 MB/s / 225.6 MB/s
> memcpy_neon : (16 bytes copy) = 175.3 MB/s / 252.2 MB/s
> memcpy_arm : (24 bytes copy) = 274.6 MB/s / 326.5 MB/s
> memcpy_neon : (24 bytes copy) = 244.7 MB/s / 367.8 MB/s
> memcpy_arm : (31 bytes copy) = 333.3 MB/s / 399.2 MB/s
> memcpy_neon : (31 bytes copy) = 304.3 MB/s / 463.5 MB/s
>
> L1 cached data:
> memcpy_arm : (4096 bytes copy) = 1295.5 MB/s / 2691.8 MB/s
> memcpy_neon : (4096 bytes copy) = 1826.3 MB/s / 2021.8 MB/s
> memcpy_arm : (6144 bytes copy) = 1306.5 MB/s / 2724.1 MB/s
> memcpy_neon : (6144 bytes copy) = 1857.8 MB/s / 2053.2 MB/s
>
> L2 cached data:
> memcpy_arm : (65536 bytes copy) = 1291.5 MB/s / 2304.8 MB/s
> memcpy_neon : (65536 bytes copy) = 1866.5 MB/s / 2441.7 MB/s
> memcpy_arm : (98304 bytes copy) = 1285.6 MB/s / 2283.8 MB/s
> memcpy_neon : (98304 bytes copy) = 1860.7 MB/s / 2454.7 MB/s
>
> SDRAM:
> memcpy_arm : (2097152 bytes copy) = 466.7 MB/s / 736.5 MB/s
> memcpy_neon : (2097152 bytes copy) = 727.5 MB/s / 868.8 MB/s
> memcpy_arm : (3145728 bytes copy) = 507.9 MB/s / 854.7 MB/s
> memcpy_neon : (3145728 bytes copy) = 852.9 MB/s / 1038.0 MB/s
>
> (*) 1 MB = 1000000 bytes
> (*) 'memcpy_arm' - an implementation for older ARM cores from glibc-ports
>
> The similar benchmark is at
> http://sourceware.org/ml/libc-ports/2009-07/msg00000.html .
>
> Regards,
> $4
>