This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: double fault
- From: Richard J Moore <richardj_moore at uk dot ibm dot com>
- To: "Stone, Joshua I" <joshua dot i dot stone at intel dot com>
- Cc: systemtap at sources dot redhat dot com
- Date: Tue, 22 Nov 2005 09:26:04 +0000
- Subject: Re: double fault
- Sensitivity:
We need to distinguish between recursive behaviour that's cause stack
depletion and insufficient stack space. If you brows the stack do you see:
1) a great chunk of unused space, or
2) a regular pattern of return addresses
If you follow the stack frames are there any huge jumps - indicating
excessive amounts of local data allocation?
- -
Richard J Moore
IBM Advanced Linux Response Team - Linux Technology Centre
MOBEX: 264807; Mobile (+44) (0)7739-875237
Office: (+44) (0)1962-817072
"Stone, Joshua
I"
<joshua.i.stone To
@intel.com> <systemtap@sources.redhat.com>
Sent by: cc
systemtap-owner
@sourceware.org bcc
Subject
22/11/2005 double fault
01:12
I am seeing sporadic double-faults when running tests on systemtap. I
am trying to run systemtap.base/lt.exp, though others fail as well. It
doesn't always fail, but if I run it four or five times in succession
that's usually enough to trigger the fault. Below are manual copies of
a couple of the faults dumped to the console:
double fault, gdt at c0358000 [255 bytes]
double fault, tss at c03dc000
eip = ffffffff, esp = f4b6500c
eax = ffffffff, ebx = ffffffff, ecx = 0000007b, edx = f4b65018
esi = ffffffff, edi = ffffffff, ebp = 00000000
double fault, gdt at c0358000 [255 bytes]
double fault, tss at c03dc000
eip = c011a799, esp = f5bd4f98
eax = f959a380, ebx = f5bd5170, ecx = 0000007b, edx = f4bd505c
esi = 00000000, edi = c011a785, ebp = 00000000
The first dump doesn't tell much, but the edi and eip values in the
second dump are interesting. 'c011a785' is the beginning of
do_page_fault, and the instruction at 'c011a799' is a read from the
stack. Methinks the stack runneth over?
This is on RHEL4 U2, i686, kernel 2.6.9-22.EL. I verified this crash on
two different machines with this kernel: an IBM T42 laptop (1.7GHz
Pentium M, 1GB RAM), and a desktop (3.6GHz Pentium 4 HT/EM64T, 2GB RAM).
I couldn't reproduce the problem with the 2.6.9-22.ELsmp kernel. I also
tried the desktop in x86_64 mode, and could not reproduce the problem
with the UP kernel nor the SMP kernel.
Please let me know if there's any other information I can provide to
help track this down...
Thanks,
Josh Stone