This is the mail archive of the gdb-prs@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

gdb/321: stack trace reports incorrect caller to abort()



>Number:         321
>Category:       gdb
>Synopsis:       stack trace reports incorrect caller to abort()
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    unassigned
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 30 20:08:02 PST 2002
>Closed-Date:
>Last-Modified:
>Originator:     liblit@acm.org
>Release:        5.1.1
>Organization:
>Environment:
Red Hat Linux 7.2
Intel Pentium 4 CPU
gcc-2.96 or gcc-3.0.1
>Description:
Under certain conditions, GDB misidentifies the function which called abort().  When presenting stack frames, it may instead report that abort() was called by whichever function appears immediately *after* the function which actually called abort.  This problem appears even when all optimization is turned off.

Attached below is a small example program which demonstrates the problem.  Compile it using "gcc -g" and no other special options.  Run it in GDB and use "bt" to examine the stack once it crashes.  Observe that function b() is incorrectly reported as the caller of abort().

The bug appears to be sensitive to code alignment.  With gcc-2.96, the bug appears if the call to abort() is preceded by (1 mod 4) calls to a().  With gcc-3.0.1, the bug appears if the call to abort() is preceded by (2 mod 4) calls to a().

The bug is also sensitive to how the abort() call appeared in the calling function.  The bug only appears if the abort() call is quite obviously on a deterministic path from the function entry point.  If the abort() is under some nontrivial control flow (e.g., an "if"), then the bug does not appear.
>How-To-Repeat:
Compile the attached source file using "gcc -g bug.c -o bug".  Run the resulting program in GDB.  When it stops with a SIGABRT, type "bt" to get a stack trace.  Observe that b() appears as the caller to abort(), even though that call was actually made by main().

The number of calls to a() is significant, and may need to be changed for different gcc versions.   Under gcc-2.96, there should be 1, 5, 9, 13, ... calls: any number that is (1 mod 4).  Under gcc-3.0.1, there should be 2, 6, 10, 14, ... calls: any number that is (2 mod 4).
>Fix:
Here's my hypothesis, based on general knowledge of compilation but without much understanding of GDB internals.  gcc knows that abort() never returns.  When gcc determines that the call to abort() must always happen, it doesn't bother to generate code for the remainder of the caller.  I have verified this by examining the generated assembly code.  There's no epilogue, nothing at all.  The "call abort" instruction is genuinely the last instruction in the function.

Now, when GDB examines the stack, it's looking at the saved program counter in each frame.  But the saved program counter is not the address of the "call" instruction: it's the address of the next instruction *after* the "call" instruction.

This is why alignment matters.  If there is any space or padding after the function that calls abort(), then the saved program counter can point inside this dead zone.  But if there is no space or padding after the function calling abort(), then the saved program counter actually corresponds to the first instruction of whatever function immediately follows the function that calls abort()!  Thus, GDB incorrectly believes that it is this later function that made the call.

I don't think we can really say that gcc is doing the wrong thing here.  The saved program counter is entirely reasonable, especially considering gcc's knowledge that it will never be used.  GDB is erroneously assuming that the saved return address must still be within the same function that contained the "call" instruction.  It's sort of a fencepost error, falling off the end of one function and into the next.

If that's what's going on, then it should be easy enough to fix.  GDB should decrement the saved program counter by one before looking it up.  That way, you're looking up the function containing the actual "call" instruction rather than the next instruction after.  Note that the size of a "call" instruction doesn't really matter: decrementing by one will still get you pointing back into some part of a multi-byte "call" instruction, which will still give the correct result when looked up in some function address range table.  This will also do the right thing even if the "call" is the very first instruction in a function, for similar reasons.
>Release-Note:
>Audit-Trail:
>Unformatted:
----gnatsweb-attachment----
Content-Type: application/octet-stream; name="bug.c"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="bug.c"

I2luY2x1ZGUgPHN0ZGxpYi5oPgoKCnN0YXRpYyB2b2lkIGEoKQp7Cn0KCgppbnQgbWFpbihpbnQg
YXJnYywgY2hhciAqYXJndltdKQp7CiAgLyogZ2NjLTIuOTY6ICB0byBzZWUgYnVnLCBjYWxsIGEo
KSA0aysxIHRpbWVzIGZvciBhbnkgayA+PSAwICovCiAgLyogZ2NjLTMuMC4xOiB0byBzZWUgYnVn
LCBjYWxsIGEoKSA0aysyIHRpbWVzIGZvciBhbnkgayA+PSAwICovCiAgYSgpOwogIGEoKTsKICBh
Ym9ydCgpOwogIAogIHJldHVybiAwOwp9CgoKc3RhdGljIHZvaWQgYigpCnsKfQo=


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]