This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: double fault





We need to distinguish between recursive behaviour that's cause stack
depletion and insufficient stack space. If you brows the stack do you see:

1) a great chunk of unused space, or
2) a regular pattern of return addresses

If you follow the stack frames are there any huge jumps - indicating
excessive amounts of local data allocation?



- -
Richard J Moore
IBM Advanced Linux Response Team - Linux Technology Centre
MOBEX: 264807; Mobile (+44) (0)7739-875237
Office: (+44) (0)1962-817072


                                                                           
             "Stone, Joshua                                                
             I"                                                            
             <joshua.i.stone                                            To 
             @intel.com>              <systemtap@sources.redhat.com>       
             Sent by:                                                   cc 
             systemtap-owner                                               
             @sourceware.org                                           bcc 
                                                                           
                                                                   Subject 
             22/11/2005               double fault                         
             01:12                                                         
                                                                           




I am seeing sporadic double-faults when running tests on systemtap.  I
am trying to run systemtap.base/lt.exp, though others fail as well.  It
doesn't always fail, but if I run it four or five times in succession
that's usually enough to trigger the fault.  Below are manual copies of
a couple of the faults dumped to the console:

double fault, gdt at c0358000 [255 bytes]
double fault, tss at c03dc000
eip = ffffffff, esp = f4b6500c
eax = ffffffff, ebx = ffffffff, ecx = 0000007b, edx = f4b65018
esi = ffffffff, edi = ffffffff, ebp = 00000000

double fault, gdt at c0358000 [255 bytes]
double fault, tss at c03dc000
eip = c011a799, esp = f5bd4f98
eax = f959a380, ebx = f5bd5170, ecx = 0000007b, edx = f4bd505c
esi = 00000000, edi = c011a785, ebp = 00000000

The first dump doesn't tell much, but the edi and eip values in the
second dump are interesting.  'c011a785' is the beginning of
do_page_fault, and the instruction at 'c011a799' is a read from the
stack.  Methinks the stack runneth over?

This is on RHEL4 U2, i686, kernel 2.6.9-22.EL.  I verified this crash on
two different machines with this kernel: an IBM T42 laptop (1.7GHz
Pentium M, 1GB RAM), and a desktop (3.6GHz Pentium 4 HT/EM64T, 2GB RAM).
I couldn't reproduce the problem with the 2.6.9-22.ELsmp kernel.  I also
tried the desktop in x86_64 mode, and could not reproduce the problem
with the UP kernel nor the SMP kernel.

Please let me know if there's any other information I can provide to
help track this down...

Thanks,

Josh Stone



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]