This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][AArch64] Optimized memset


ping

-----Original Message-----
From: Wilco Dijkstra [mailto:wdijkstr@arm.com] 
Sent: 31 July 2015 16:02
To: 'GNU C Library'
Subject: [PATCH][AArch64] Optimized memset

This is an optimized memset for AArch64. Memset is split into 4 main cases: small sets of up to 16
bytes, medium of 16..96 bytes which are fully unrolled. Large memsets of more than 96 bytes align
the destination and use an unrolled loop processing 64 bytes per iteration. Memsets of zero of more
than 256 use the dc zva instruction, and there are faster versions for the common ZVA sizes 64 or
128. STP of Q registers is used to reduce codesize without loss of performance.

Speedup on test-memset is 1% on Cortex-A57 and 8% on Cortex-A53. On a random test with varying sizes
and alignment the new version is 50% faster.

OK for commit?

ChangeLog:
2015-07-31  Wilco Dijkstra  <wdijkstr@arm.com>

	* sysdeps/aarch64/memset.S (__memset): 
	Rewrite of optimized memset.

Attachment: 0001-Optimized-memset.txt
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]