This is the mail archive of the
libc-ports@sources.redhat.com
mailing list for the libc-ports project.
Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: "Ryan S. Arnold" <ryan dot arnold at gmail dot com>
- Cc: Siddhesh Poyarekar <siddhesh at redhat dot com>, Carlos O'Donell <carlos at redhat dot com>, Will Newton <will dot newton at linaro dot org>, "libc-ports at sourceware dot org" <libc-ports at sourceware dot org>, Patch Tracking <patches at linaro dot org>
- Date: Thu, 5 Sep 2013 10:04:21 +0200
- Subject: Re: [PATCH] sysdeps/arm/armv7/multiarch/memcpy_impl.S: Improve performance.
- Authentication-results: sourceware.org; auth=none
- References: <5220F1F0 dot 80501 at redhat dot com> <CANu=DmhA9QvSe6RS72Db2P=yyjC72fsE8d4QZKHEcNiwqxNMvw at mail dot gmail dot com> <52260BD0 dot 6090805 at redhat dot com> <20130903173710 dot GA2028 at domone dot kolej dot mff dot cuni dot cz> <522621E2 dot 6020903 at redhat dot com> <20130903185721 dot GA3876 at domone dot kolej dot mff dot cuni dot cz> <5226354D dot 8000006 at redhat dot com> <20130904073008 dot GA4306 at spoyarek dot pnq dot redhat dot com> <20130904110333 dot GA6216 at domone dot kolej dot mff dot cuni dot cz> <CAAKybw8L6A7RpMzbp3WheVciMwMTWko3uWgxV_9KPYtEJZ=WHQ at mail dot gmail dot com>
On Wed, Sep 04, 2013 at 12:37:33PM -0500, Ryan S. Arnold wrote:
> On Wed, Sep 4, 2013 at 6:03 AM, OndÅej BÃlka <neleai@seznam.cz> wrote:
> > On Wed, Sep 04, 2013 at 01:00:09PM +0530, Siddhesh Poyarekar wrote:
> >> 4. Measure the effect of dcache pressure on function performance
> >> 5. Measure effect of icache pressure on function performance.
> >>
> > Here you really need to base weigths on function usage patterns.
> > A bigger code size is acceptable for functions that are called more
> > often. You need to see distribution of how are calls clustered to get
> > full picture. A strcmp is least sensitive to icache concerns, as when it
> > is called its mostly 100 times over in tight loop so size is not big issue.
> > If same number of call is uniformnly spread through program we need
> > stricter criteria.
>
> Icache pressure is probably one of the more difficult things to
> measure with a benchmark. I suppose it'd be easier with a pipeline
> analyzer.
>
> Can you explain how usage pattern analysis might reveal icache pressure?
>
With profiler its simple, I profiled firefox a while, results are here:
http://kam.mff.cuni.cz/~ondra/benchmark_string/strcmp_profile_firefox/result.html
Now when you look to 'Delays between calls' graph you will see peak
which is likely caused by strcmp being called in loop.
>From graph about 2/3 of calls happen in less than 128 cycles since last
one. As there is limited number of cache lines that you can access in
128 cycles per call impact is smaller.
> I'm not sure how useful 'usage pattern' are when considering dcache
> pressure. On Power we have data-cache prefetch instructions and since
> we know that dcache pressure is a reality, we will prefetch if our
> data sizes are large enough to out-weigh the overhead of prefetching,
> e.g., when the data size exceeds the cacheline size.
>
Very useful as overhead of prefetching is determined that this quantity.
You can have two applications that often call memset with size 16000.
First one uses memset to refresh one static array which is entirely in
L1 cache and prefetching is harmful.
Second one does random access of 1GB of memory and prefetching would
help.
Swithching to prefetching when you exceed cache size has advantage of
certainty that is will help.
Real treshold is lower as it is unlikely that large array got as
argument is only thing that occupies cache.