This is the mail archive of the
libc-ports@sources.redhat.com
mailing list for the libc-ports project.
Re: [Patches] [PATCH] ARM: NEON detected memcpy.
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: "Shih-Yuan Lee (FourDollars)" <sylee at canonical dot com>
- Cc: "Joseph S. Myers" <joseph at codesourcery dot com>, libc-ports at sourceware dot org, Jesse Sung <jesse dot sung at canonical dot com>, patches at eglibc dot org, YC Cheng <yc dot cheng at canonical dot com>, rex dot tsai at canonical dot com
- Date: Wed, 3 Apr 2013 18:19:50 +0200
- Subject: Re: [Patches] [PATCH] ARM: NEON detected memcpy.
- References: <CAAT15mNnqeb6tuVdV6b4uJf-qFDH1acxevyW6f-gH+SkguENmg at mail dot gmail dot com> <Pine dot LNX dot 4 dot 64 dot 1304031505020 dot 580 at digraph dot polyomino dot org dot uk> <CAAT15mMJSHiO5rZ6EAbss79f_t4Qiaryi-qjmw3TwGg4vrg2=A at mail dot gmail dot com>
On Wed, Apr 03, 2013 at 11:47:36PM +0800, Shih-Yuan Lee (FourDollars) wrote:
> Hi Joseph,
>
...
> > I was previously told by people at ARM that NEON memcpy wasn't a good idea
> > in practice because of raised power consumption, context switch costs etc.
> > from using NEON in processes that otherwise didn't use it, even if it
> > appeared superficially beneficial in benchmarks.
> >
> About raised power consumption and context switch costs, I may be able
> to add some option in configure for the users to decide if they want
> to use this feature or not.
> How do you think?
>
Configure option is bit overkill.
You need to compare neon/other implementation speed. Then determine
size where neon is faster if we include energy cost and context switch.
My first estimate is use neon when larger than 4096 bytes.
However to determine context switch cost of neon you must account network effect.
If you use neon in one function that is called sufficiently often (to
always save registers) then adding neon implementation for additional functions
does not increase cost.