This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: double fault


Roland McGrath wrote:
> The second crash had an esp of 0xf5bd4f98.  If that's a proper stack
> pointer, it's only 104 bytes from the beginning of the stack. 
> Considering that the trap frame itself is 60 bytes, that's fairly
> small for a realistic stack.  It might well be that in fact it's an
> overflowed stack that grew down from below 0xf5bd6000 and overflowed
> by getting below 0xf5bd5034 (which is the end of the struct
> thread_info at the base of the stack). 

I added a check to monitor the stack on the probe entrance, like this:

	unsigned left = (unsigned)CONTEXT->regs & 0xfff;
	printk("stap_debug: %d bytes on the stack");

Once I added that, I started getting only a single output and then a
crash every time.  The value reported is consistantly 3976 bytes - only
120 bytes from the top.  And the eip is now consistantly at that stack
read within do_page_fault as well.

>> Is there a way I can get the double-fault to print a full oops, with
>> a stack trace?
> 
> No, it's a special trap handler that uses its own stack and just has
> the simple printks you've seen.  You'd have to do something like put
> a probe on the line in doublefault_fn where it printk's the esp et
> al, and have that call show_trace on t->esp or something.

A probe here doesn't work.  I tried it, and the system hung up
completely (a triple-fault?).  I think things must be hosed up pretty
bad by the time it gets to doublefault_fn.

And thanks to the infinite wisdom of Linus, it's a pain to get a
debugger in there.  I tried kdb first, but kdb doesn't automatically
catch double-faults.  I put a breakpoint on doublefault_fn, and it
triggered, but kdb just panicked about invalid memory references as it
was trying to take over.  Again, to me this seems to indicate trouble
with the stack.  I couldn't get kgdb to work at all on the RHEL4 kernel
- likely patching issues.


Martin Hunt wrote:
> But I'm not sure its worth pursuing further because it appears to not
> happen in the newer version of kprobes.

Perhaps, or perhaps there's still a landmine in there that is just
better obscured in the newer kprobes.  I would feel much better if there
was a known fix that occurred, instead of the problem magically
disappearing.  I don't think I will spend much more time on this though,
at least until someone runs into the same issue on the new kprobes.

At the very least, judging by the side conversations, we now appear to
have quite a few people looking closely at the fault handling code...

Thanks,

Josh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]