This is the mail archive of the libc-alpha@sources.redhat.com mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [libc-alpha] linuxthreads bug in 2.2.4 under ppc linux


Hi,

I gave it one last look and I seemed to have stumbled upon what I think 
might be the problem.

Under -O3 ppc seems to be using oldstatus in place of pnode

Here is the code:

1. First it comes in and sets up the frame and then branches to the 
compare and swap routine:

0000c1b8 <__pthread_alt_unlock>:
    c1b8:       94 21 ff f0     stwu    r1,-16(r1)
    c1bc:       7c 08 02 a6     mflr    r0
    c1c0:       93 c1 00 08     stw     r30,8(r1)
    c1c4:       93 e1 00 0c     stw     r31,12(r1)
    c1c8:       90 01 00 14     stw     r0,20(r1)
    c1cc:       48 01 9c 19     bl      25de4 <__DTOR_END__+0x4>
    c1d0:       7f c8 02 a6     mflr    r30
    c1d4:       7c 64 1b 78     mr      r4,r3
    c1d8:       7c 00 04 ac     sync
    c1dc:       39 80 00 00     li      r12,0
    c1e0:       3b e0 00 01     li      r31,1
    c1e4:       48 00 01 a0     b       c384 <__pthread_alt_unlock+0x1cc>


Then here is the compare and swap routine:

    c384:       80 03 00 00     lwz     r0,0(r3)
    c388:       28 00 00 01     cmplwi  r0,1
    c38c:       41 81 fe 5c     bgt+    c1e8 <__pthread_alt_unlock+0x30>
    c390:       7c 00 04 ac     sync
    c394:       7d 20 18 28     lwarx   r9,r0,r3
    c398:       7c 09 4a 79     xor.    r9,r0,r9
    c39c:       40 82 00 0c     bne-    c3a8 <__pthread_alt_unlock+0x1f0>
    c3a0:       7d 80 19 2d     stwcx.  r12,r0,r3
    c3a4:       40 a2 ff f0     bne-    c394 <__pthread_alt_unlock+0x1dc>
    c3a8:       7d 20 4b 78     mr      r0,r9
    c3ac:       2c 00 00 00     cmpwi   r0,0
    c3b0:       40 82 ff d4     bne+    c384 <__pthread_alt_unlock+0x1cc>
    c3b4:       80 01 00 14     lwz     r0,20(r1)
    c3b8:       7c 08 03 a6     mtlr    r0
    c3bc:       83 c1 00 08     lwz     r30,8(r1)
    c3c0:       83 e1 00 0c     lwz     r31,12(r1)
    c3c4:       38 21 00 10     addi    r1,r1,16
    c3c8:       4e 80 00 20     blr

It line: c384 it loads the lock status and compares it to 1 and if greater 
it jumps back up otherwise it enters the lwarx stwcx. pairing and after 
that subloop returns, r0 has the oldvalue from the lock status and if it 
is greater than 1 it branches back to the top otherwise the whole thing 
just returns (it has freed the lock).

So if greater than 1 (there were waiters) it goes to:

   c1e8:       7c 0b 03 78     mr      r11,r0
    c1ec:       7c 68 1b 78     mr      r8,r3
    c1f0:       7c 65 1b 78     mr      r5,r3
    c1f4:       7d 67 5b 78     mr      r7,r11
    c1f8:       3c c0 80 00     lis     r6,-32768
    c1fc:       7c 00 04 ac     sync
    c200:       80 0b 00 08     lwz     r0,8(r11)

Here it stores the oldstatus in r11 (it never reloads pnode from memory it 
instead just uses oldstatus in its place and r11 is used to immediately 
access the abandoned field since r11 is assumed to be a pointer to a 
wait_node.    

But in the *working* version (registers are used differently here but I 
will point out the differences.  If there are waiters (the cas has 
returned a value of greater than 1 for the lock status)  r27 is the 
pointer to the lock and the lock status is *reread* from memory instead of 
using the results of the oldvalue from the compare and swap routine.

    c214:       83 bb 00 00     lwz     r29,0(r27)
    c218:       7f 7c db 78     mr      r28,r27
    c21c:       7f 79 db 78     mr      r25,r27
    c220:       7f bf eb 78     mr      r31,r29
    c224:       3f 40 80 00     lis     r26,-32768
    c228:       7c 00 04 ac     sync

So effectively, under -O3 the compiler has merged oldstatus and the pnode 
value loaded from the *pp_head into the same variable.

#if defined HAS_COMPARE_AND_SWAP
    {
      long oldstatus = lock->__status;
      if (oldstatus == 0 || oldstatus == 1) {
        if (__compare_and_swap_with_release_semantics (&lock->__status, 
oldstatus, 0))
          break;
        else
          continue;
      }
    }
#endif

    /* Process the entire queue of wait nodes. Remove all abandoned
       wait nodes and put them into the global free queue, and
       remember the one unabandoned node which refers to the thread
       having the highest priority. */

    pp_max_prio = pp_node = pp_head;
    p_max_prio = p_node = *pp_head;
    maxprio = INT_MIN;

    READ_MEMORY_BARRIER(); /* Prevent access to stale data through p_node 
*/

    while (p_node != (struct wait_node *) 1) {


Effectively the line:

    p_max_prio = p_node = *pp_head;

above have been optimized away and the value of oldstatus from the 
previous compare and swap is being used for pnode directly.

If I print or otherwise access pnode in any way, inside the loop (I added 
a simple MSG to print pnode if it is null for example), then this 
optimization does not happen and all functions well.

So this change seems to be part of the problem.

I am not sure if this difference is important but it is a real difference 
in how the code works.

Why are we not using the oldstatus from the compare and swap in place of 
pnode?  Was there a reason?  If so, then we should somehow tell the 
compiler not to optimize away pnode.


That's about the only differnce I can find.

I hope this helps,

Kevin


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]