Not apparent on FC5, however on FC6. Attached is a test program. The two functions foo() and jump() perform the mundane tasks of incrementing and calculating integer values in a while(1) loop controled by foo(). There is a dummy function at the top of the file, shazaam(), which does nothing. At the end of this loop, a call to jump() is made. However, shazaam() ends up getting highlighted, and the stack is missle foo() from its middle. According to the jump() frame, its address is 0x400528, but when we examine this in gdb: (gdb) x/i 0x400528 0x400528 <shazaam>: push %rbp It's actually pointing to shazaam(). Subsequent stepping resumes correctly after this initial incorrect address. I placed print statements in the constructors of each of our StackFrame objects, printing out the name of the frame, its CFA, and address. Here are the outputs for the last three steps: foo 140737031278864 59 0x4005fa main 140737031278912 71 0x400641 __libc_start_main 140737031278960 0 0x2aaaaaefda44 _start 140737031279152 0 0x400499 foo 140737031278864 59 0x4005ff main 140737031278912 71 0x400641 __libc_start_main 140737031278960 0 0x2aaaaaefda44 _start 140737031279152 0 0x400499 jump 140737031278856 13 0x400528 main 140737031278912 71 0x400641 __libc_start_main 140737031278960 0 0x2aaaaaefda44 _start 140737031279152 0 0x400499 The frame for foo() is missing, and jump()'s address points to shazaam, which has never been called. Apparently libunwind is returning garbage for the trace at this point?
Created attachment 1422 [details] test program
Created attachment 1430 [details] Fix requiring to disable internal caching (included) 000000000804848c jump (sp=00000000bff7bfbc) proc=000000000804848c-00000000080484ba handler=0 lsda=0 000000000804856e foo+0xa2 (sp=00000000bff7bfc0) proc=00000000080484cc-0000000008048570 handler=0 lsda=0 00000000080485b3 main+0x43 (sp=00000000bff7bfe0) proc=0000000008048570-00000000080485c1 handler=0 lsda=0 000000000082fdec __libc_start_main+0xdc (sp=00000000bff7c010) proc=000000000082fd10-000000000082fded handler=0 lsda=0 0000000008048401 _start+0x21 (sp=00000000bff7c080) proc=00000000080483e0-0000000008048402 handler=0 lsda=0 ================ 000000000804848d jump+0x1 (sp=00000000bff7bfb8) proc=000000000804848c-00000000080484ba handler=0 lsda=0 000000000804856e foo+0xa2 (sp=00000000bff7bfc0) proc=00000000080484cc-0000000008048570 handler=0 lsda=0 00000000080485b3 main+0x43 (sp=00000000bff7bfe0) proc=0000000008048570-00000000080485c1 handler=0 lsda=0 000000000082fdec __libc_start_main+0xdc (sp=00000000bff7c010) proc=000000000082fd10-000000000082fded handler=0 lsda=0 0000000008048401 _start+0x21 (sp=00000000bff7c080) proc=00000000080483e0-0000000008048402 handler=0 lsda=0 ================ 000000000804848f jump+0x3 (sp=00000000bff7bfb8) proc=000000000804848c-00000000080484ba handler=0 lsda=0 000000000804856e foo+0xa2 (sp=00000000bff7bfc0) proc=00000000080484cc-0000000008048570 handler=0 lsda=0 00000000080485b3 main+0x43 (sp=00000000bff7bfe0) proc=0000000008048570-00000000080485c1 handler=0 lsda=0 000000000082fdec __libc_start_main+0xdc (sp=00000000bff7c010) proc=000000000082fd10-000000000082fded handler=0 lsda=0 0000000008048401 _start+0x21 (sp=00000000bff7c080) proc=00000000080483e0-0000000008048402 handler=0 lsda=0 ================
Created attachment 1432 [details] libunwind-local testcase for debugging
There is still needed to make the internal caching compatible with the patch. Still the functionality should be final, I hope.
Jan FYI, nice catch, but there's more ... - --ip; + /* In the current (lowest) frame we must not touch `ip' as the current + address is where we stand. On the other hand any upper frames will stand + on the next instruction behind our call which may have a different stack + DWARF information (for `stdcall' called functions) or the next instruction + even may belong already to a different continuing function. */ + if (!c->first_step) + --ip; this can also occure a function was interrupted with a signal at its first instruction giving the layout: inner-most <signal-trampoline> foo-at-first-instruction more generally there are two cases, where the function was interrupted (inner most and caller of signal-trampoline, and others making a normal call
Created attachment 1435 [details] Fixed patch, handles signal frames, caching is fixed/unaffected Final patch.
Created attachment 1436 [details] Testcase including signal frame Testcase still needs to get properly integrated into libunwind testsuite. Expected output: 00000000080484ec jump (sp=00000000bf90062c) proc=00000000080484ec-000000000804851a handler=0 lsda=0 000000000804855d foo+0x31 (sp=00000000bf900630) proc=000000000804852c-00000000080485d5 handler=0 lsda=0 0000000000eba420 __kernel_sigreturn (sp=00000000bf900650) proc=0000000000eba41f-0000000000eba428 handler=0 lsda=0 00000000080485d5 lockup (sp=00000000bf90092c) proc=00000000080485d5-00000000080485da handler=0 lsda=0 0000000008048632 prefoo+0x58 (sp=00000000bf900930) proc=00000000080485da-0000000008048634 handler=0 lsda=0 0000000008048697 main+0x63 (sp=00000000bf900960) proc=0000000008048634-00000000080486a5 handler=0 lsda=0 000000000082fdec __libc_start_main+0xdc (sp=00000000bf900990) proc=000000000082fd10-000000000082fded handler=0 lsda=0 0000000008048461 _start+0x21 (sp=00000000bf900a00) proc=0000000008048440-0000000008048462 handler=0 lsda=0 ================
Still not committed - x86_64 signal frames affected by glibc: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217087 Is it enough to get the unwinding fixed in glibc RawHide or should libunwind provide a workaround for legacy glibc releases? I believe RawHide is enough.
x86_64 signal frame functionality still dependent on resolving glibc's: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=217087