This is the mail archive of the
libc-ports@sources.redhat.com
mailing list for the libc-ports project.
[PATCH]: Performance improve on ARM memset
- From: Min Zhang <mzhang at mvista dot com>
- To: libc-ports at sources dot redhat dot com
- Date: Tue, 28 Oct 2008 17:00:10 -0700
- Subject: [PATCH]: Performance improve on ARM memset
This patch improves the execution time of the memset. Tested by "time"
shell utility on the following test program. The patch reduced execution
time by 50%. Also sanity tested the memset with length from 0 byte to
1000 bytes, just to make sure it doesn't memset any extra or less bytes.
int main()
{
char* p = malloc(4096);
for (int i=0; i<100000; i++) {
memset(p, 0, 4096);
}
}
Note: This patch sort of undo the
http://sources.redhat.com/cgi-bin/cvsweb.cgi/ports/sysdeps/arm/memset.S.diff?r1=1.4&r2=1.5&cvsroot=glibc
by reverting "str" back to more efficient block copy "stm" instruction.
I am not sure the reason behind the rev 1.5 change.
2008-10-28 Min Zhang <mzhang@mvista.com>
Index: ports/sysdeps/arm/memset.S
===================================================================
RCS file: /cvs/glibc/ports/sysdeps/arm/memset.S,v
retrieving revision 1.6
diff -u -r1.6 memset.S
--- ports/sysdeps/arm/memset.S 10 Oct 2005 15:00:47 -0000 1.6
+++ ports/sysdeps/arm/memset.S 28 Oct 2008 23:06:01 -0000
@@ -35,20 +35,17 @@
and r1, r1, #255 @ clear any sign bits
orr r1, r1, r1, lsl $8
orr r1, r1, r1, lsl $16
+ mov ip, r1
1:
subs r2, r2, #8
- strcs r1, [r3], #4 @ store up to 32 bytes per loop iteration
- strcs r1, [r3], #4
+ stmcsia r3!, {r1, ip} @ store up to 32 bytes per loop iteration
subcss r2, r2, #8
- strcs r1, [r3], #4
- strcs r1, [r3], #4
+ stmcsia r3!, {r1, ip}
subcss r2, r2, #8
- strcs r1, [r3], #4
- strcs r1, [r3], #4
+ stmcsia r3!, {r1, ip}
subcss r2, r2, #8
- strcs r1, [r3], #4
- strcs r1, [r3], #4
+ stmcsia r3!, {r1, ip}
bcs 1b