This is the mail archive of the
libc-alpha@sources.redhat.com
mailing list for the glibc project.
glibc-2.3 alpha stxncpy.S
- From: Glen Nakamura <glen at imodulo dot com>
- To: libc-alpha at sources dot redhat dot com
- Cc: rth at redhat dot com
- Date: Thu, 3 Oct 2002 22:43:57 -1000
- Subject: glibc-2.3 alpha stxncpy.S
FYI, the version of stxncpy.S that got accepted into glibc-2.3 to fix
the stratcliff failure stalls in $u_loop. According to the 21164 hardware
reference manual, load instructions have a 2 cycle latency. The code
attempts to use t2 one cycle after the load which causes a 1 cycle stall...
$u_loop:
or t0, t1, t0 # e0 : current dst word now complete
subq a2, 1, a2 # .. e1 : decrement word count
stq_u t0, 0(a0) # e0 : save the current word
addq a0, 8, a0 # .. e1 :
extql t2, a1, t1 # e0 : extract high bits for next time
beq a2, $u_eoc # .. e1 :
ldq_u t2, 8(a1) # e0 : load high word for next time
addq a1, 8, a1 # .. e1 :
nop # e0 :
>>> STALLS for 1 cycle to load t2 <<<
cmpbge zero, t2, t7 # .. e1 : test new word for eos
extqh t2, a1, t0 # e0 : extract low bits for current word
beq t7, $u_loop # .. e1 :
The version of the fix I sent earlier avoided the stall by scheduling
the address increment instructions in the otherwise unused cycle:
http://sources.redhat.com/ml/libc-alpha/2002-09/msg00436.html
Anyway, 1 cycle isn't a big deal, but perhaps a comment should be added to
indicate the stall as is done in $a_loop?
- glen