This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 2/2] Ignore prefetcher tagging for smaller copies

From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
To: Siddhesh Poyarekar <siddhesh at sourceware dot org>, libc-alpha at sourceware dot org
Cc: nd at arm dot com
Date: Thu, 10 May 2018 11:29:10 +0100
Subject: Re: [PATCH 2/2] Ignore prefetcher tagging for smaller copies
Nodisclaimer: True
References: <20180503175209.2943-1-siddhesh@sourceware.org> <20180503175209.2943-3-siddhesh@sourceware.org>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99

On 03/05/18 18:52, Siddhesh Poyarekar wrote:

For smaller and medium sized copies, the effect of hardware
prefetching are not as dominant as instruction level parallelism.
Hence it makes more sense to load data into multiple registers than to
try and route them to the same prefetch unit.  This is also the case
for the loop exit where we are unable to latch on to the same prefetch
unit anyway so it makes more sense to have data loaded in parallel.

The performance results are a bit mixed with memcpy-random, with
numbers jumping between -1% and +3%, i.e. the numbers don't seem
repeatable.  memcpy-walk sees a 70% improvement (i.e. > 2x) for 128
bytes and that improvement reduces down as the impact of the tail copy
decreases in comparison to the loop.

	* sysdeps/aarch64/multiarch/memcpy_falkor.S (B_l, B_lw, C_l,
	D_l, E_l, F_l, G_l, A_h, B_h, C_h, D_h, E_h, F_h, G_h): New
	macros.
	(__memcpy_falkor): Use multiple registers to copy data in loop
	tail.


OK to commit.

References:
- [PATCH 0/2] aarch64,falkor: memcpy/memmove performance improvements
  - From: Siddhesh Poyarekar
- [PATCH 2/2] Ignore prefetcher tagging for smaller copies
  - From: Siddhesh Poyarekar

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]