This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
This problem happens on both Tru64 4.0f and 5.1/5.1A. To reproduce the problem, you can use any Ada program using tasks, and try to perform a task switch: << (gdb) info tasks ID TID P-ID Pri Stack % State Name 1 140025400 0 30 Unknown Child Termination Wait main_task * 2 140049000 1 30 48K 10 Accepting RV with 3 maitre_d 3 140049c00 1 30 48K 7 Waiting on entry call dijkstra 4 140050000 1 30 48K 7 Waiting on entry call stroustrup 5 140054800 1 30 48K 7 Waiting on entry call anderson 6 140055400 1 30 48K 7 Runnable ichbiah 7 140061000 1 30 48K 7 Waiting on entry call taft (gdb) task 4 warning: Hit heuristic-fence-post without finding <<<--- OOPS! warning: enclosing function for address 0x1400433e0 This warning occurs if you are debugging a function without any symbols (for example, in a stripped executable). In that case, you may wish to increase the size of the search with the `set heuristic-fence-post' command. Otherwise, you told GDB there was a function where there isn't one, or (more likely) you have encountered a bug in GDB. [Switching to task 4] #0 0x3ff8057d43c in __hstTransferRegistersPC () from /usr/shlib/libpthread.so >> The real problem is not the task switching operation per se, but rather the fact that GDB fails to unwind the stack correctly. This problem also appears if we try to use "bt" after the task switch: << (gdb) bt #0 0x3ff8057d43c in __hstTransferRegistersPC () from /usr/shlib/libpthread.so #1 0x3ff8056e8e4 in __osTransferContext () from /usr/shlib/libpthread.so #2 0x3ff80560c30 in __dspDispatch () from /usr/shlib/libpthread.so warning: Hit heuristic-fence-post without finding <<<--- Same oops! warning: enclosing function for address 0x1400433e0 >> I'm sorry the test case involves Ada, but that is the only way I found to reproduce it at the time when I looked at this problem. The problem happens on all non-active tasks, but otherwise GDB works fine with the active one. So why does it always fail on the non-active tasks, and no in the active one? The answer lies in the fact that all non-active threads call the same procedure to perform a context switch, and end up being stopped at the same code location, somewhere inside __hstTransferRegisters. This means that the the stack trace for all non-current tasks always starts like this: #0 0x3ff805ca8cc in __hstTransferRegistersPC () from /usr/shlib/libpthread.so #1 0x3ff805af458 in __osTransferContext () from /usr/shlib/libpthread.so #2 0x3ff805a39c4 in __dspTransferContext () from /usr/shlib/libpthread.so #3 0x3ff805a1068 in __dspDispatch () from /usr/shlib/libpthread.so GDB is able to unwind up to __dspTransferContext, and then always fails. I found that on alpha, GDB uses a heuristic algorithm to compute the stack (that is actually not surprising given the warning message): For each frame, it actually computes the "proc_desc" by reading the instructions in the function prologue. I later found that GDB was computing an incorrect return address for frame #2. Using this incorrect return address, GDB naturally could not locate the function which called __dspTransferContext, and therefore reported an error. The answer to this failure lied in the computation of the proc_desc of frame #1, but we need to look at its code to understand why: 0x3ff805af250 <__osTransferContext>: ldah gp,16321(t12) 0x3ff805af254 <__osTransferContext+4>: unop 0x3ff805af258 <__osTransferContext+8>: lda gp,-3312(gp) 0x3ff805af25c <__osTransferContext+12>: unop 0x3ff805af260 <__osTransferContext+16>: lda sp,-64(sp) 0x3ff805af264 <__osTransferContext+20>: stq ra,0(sp) 0x3ff805af268 <__osTransferContext+24>: stq s0,8(sp) 0x3ff805af26c <__osTransferContext+28>: stq s1,16(sp) 0x3ff805af270 <__osTransferContext+32>: stq s2,24(sp) 0x3ff805af274 <__osTransferContext+36>: stq s3,32(sp) 0x3ff805af278 <__osTransferContext+40>: stq s4,40(sp) 0x3ff805af27c <__osTransferContext+44>: unop 0x3ff805af280 <__osTransferContext+48>: stq fp,48(sp) 0x3ff805af284 <__osTransferContext+52>: mov sp,fp [...] 0x3ff805af304 <__osTransferContext+180>: lda sp,-336(sp) [...] As we can see with the lda instruction at +16, the frame size is 64 bytes. And instruction at +20 shows that the return address is saved at 0(sp). At first, GDB gets the correct frame size, and finds that ra is at 0(sp) and therefore computes the adress where ra is saved by adding 0 to the current value of sp. This is where things start to go wrong. I have also pasted one very important instruction at +180, which expands the size of the current frame (typical of a frame which does some alloca)! So 0(sp) does not point to the saved $ra anymore, Ouch! Fortunately, the instruction at +52 shows that $sp has been saved into $fp, so $ra (and all other saved registers) can be retrieved using $fp as the base address instead of $sp. This is the main point of my change: when the $fp register is used in the frame, then use it, rather than using $sp. Another error in the current code (somewhat related really) is that the heuristic algorithm was accumulating all stack allocations detected in the function being inspected. In our case, it found two "lda sp, nnn(sp)" instructions, one at +16, and one at +180, and therefore considered the size of the frame to be 64+336=400 bytes. Oups. I also fixed the code to only take into account the first stack allocation to get the frame size. With both fixes, the backtrace started working in this case as well: #0 0x3ff805ca8cc in __hstTransferRegistersPC () from /usr/shlib/libpthread.so #1 0x3ff805af458 in __osTransferContext () from /usr/shlib/libpthread.so #2 0x3ff805a39c4 in __dspTransferContext () from /usr/shlib/libpthread.so #3 0x3ff805a1068 in __dspDispatch () from /usr/shlib/libpthread.so #4 0x3ff805a01f4 in __cvWaitPrim () from /usr/shlib/libpthread.so #5 0x3ff8059da1c in __pthread_cond_wait () from /usr/shlib/libpthread.so #6 0x12003dedc in system.tasking.entry_calls.wait_for_completion () at s-taenca.adb:6 #7 0x120047b10 in system.tasking.rendezvous.call_synchronous () at s-tasren.adb:6 #8 0x120043acc in system.tasking.rendezvous.call_simple () at s-tasren.adb:6 #9 0x12002f824 in phil.philosopher (<_task>=0x140042ee0) at phil.adb:49 #10 0x120049b40 in system.tasking.stages.task_wrapper () at s-tassta.adb:6 #11 0x3ff805bca7c in __thdBase () from /usr/shlib/libpthread.so Joy! Here is the ChangeLog: 2002-06-14 Joel Brobecker <brobecker@gnat.com> * alpha-tdep.c (heuristic_proc_desc): Compute the size of the current frame using only the first stack size adjustment. All subsequent size adjustments are not considered to be part of the "static" part of the current frame. Compute the address of the saved registers relative to the Frame Pointer ($fp) instead of the Stack Pointer if $fp is in use in this frame. Ok to commit? Thanks, -- Joel
Attachment:
alpha-tdep.c.diff
Description: Text document
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |