This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
Example of optimized strlen
- To: libc-alpha at sources dot redhat dot com
- Subject: Example of optimized strlen
- From: Bonz <bonzini at gnu dot org>
- Date: Wed, 28 Feb 2001 11:42:58 +0100
I attach a fast strlen that I wrote and a commented version from glibc's
CVS repository. The comments include cycle counts and highlight
three partial register stalls.
Here are the results for a Pentium (counting clocks for the P6 is
difficult, but take into account that up to 12 clocks are lost for the
partial register stalls on the P6 in the finalization, and that *each*
iteration of the inner loop loses 6 clocks because of the other stall).
I'm not considering cache misses nor branch mispredictions.
my strlen glibc strlen
---------------------------------------------------------------------
startup if aligned 2 2
startup if misaligned (worst case) 7 12
---------------------------------------------------------------------
inner loop n 1.25*n
---------------------------------------------------------------------
finalization (worst case) 9 9
---------------------------------------------------------------------
The startup costs are better in my version, as is the inner loop's
timing.
(My strlen has no support for bounded pointers yet).
Paolo
strlen.S
glibc-strlen.S