Bug 1594

Summary: simple script oops box recursively
Product: systemtap Reporter: James Dickens <jamesd.wi>
Component: kprobesAssignee: Prasanna S Panchamukhi <prasanna>
Status: RESOLVED DUPLICATE    
Severity: normal CC: amavin, jkenisto
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description James Dickens 2005-10-28 21:31:09 UTC
the following script causes the system to oops recursively or perhaps it never
disables the probe when the oops is generated. 

versions:
[jamesd@localhost ~]$ stap -V
SystemTap translator/driver (version 0.4.1 built 2005-09-22)
Copyright (C) 2005 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
[jamesd@localhost ~]$
Linux localhost.localdomain 2.6.12-1.1447_FC4 #1 Fri Aug 26 20:29:51 EDT 2005
i686 athlon i386 GNU/Linux

Linux localhost.localdomain 2.6.12-1.1447_FC4 #1 Fri Aug 26 20:29:51 EDT 2005
i686 athlon i386 GNU/Linux

screen shot of part of the oops is at 
http://www.blastwave.org/~jamesd/systemtap/oops2.PNG

script:
#! stap

probe kernel.function("sched_clock")
{
       called++;
}

probe end
{
       print("sched_clock called " . string(called) . " times.\n");
}

probe timer.jiffies(100)
{
       exit();
}
Comment 1 Vara Prasad 2005-10-31 20:49:43 UTC
Prasanna, can you look into this bug.
Comment 2 Jim Keniston 2005-11-01 18:15:59 UTC
Note that the variable "called" is not declared global.
Comment 3 Prasanna S Panchamukhi 2005-11-11 14:38:43 UTC
I am seeing different behaviours when a kernel module is loaded to insert probes
on sched_clock() with calling printks from the handlers.  On i386 uniprocessor
box, I see lots of oops messages. But on i386 smp box, probes handlers are
executed and I can see messages on the console, the system is not hung, but it
does not allow me to remove the loaded kernel module as well.

If no printks are used in the handlers, it runs fine on both uni and smp box.
One solution would be to prevent calling of printks from probe handlers of
sched_clock(). Another solution would be to avoid probes on sched_clock().

Also there are situations where inserted probes cannot be removed from the
command line, in such situations we need to provide a SysRq key support to
remove all the probes from the kernel.

-Prasanna
Comment 4 James Dickens 2005-11-11 19:03:54 UTC
Subject: Re:  simple script oops box recursively

On 11 Nov 2005 14:38:43 -0000, prasanna at in dot ibm dot com
<sourceware-bugzilla@sourceware.org> wrote:
>
> ------- Additional Comments From prasanna at in dot ibm dot com  2005-11-11 14:38 -------
> I am seeing different behaviours when a kernel module is loaded to insert probes
> on sched_clock() with calling printks from the handlers.  On i386 uniprocessor
> box, I see lots of oops messages. But on i386 smp box, probes handlers are
> executed and I can see messages on the console, the system is not hung, but it
> does not allow me to remove the loaded kernel module as well.
>
> If no printks are used in the handlers, it runs fine on both uni and smp box.
> One solution would be to prevent calling of printks from probe handlers of
> sched_clock(). Another solution would be to avoid probes on sched_clock().

please see my comments in 
http://sourceware.org/bugzilla/show_bug.cgi?id=1776 about printk
requirements.

>
> Also there are situations where inserted probes cannot be removed from the
> command line, in such situations we need to provide a SysRq key support to
> remove all the probes from the kernel.
>

This isn't a solution, if you ever wish systemtap to be useful on any
System other than a single developers box. Most production boxes don't
have a keyboard attached. Are you going to add a quick hack to ssh
that sends the SysRq key to a task?

You need to solve the real problem rather than just adding another quick hack.

Solve the problem, rather than hiding/removing the symptoms.

James Dickens
uadmin.blogspot.com



> -Prasanna
>
> --
>
>
> http://sourceware.org/bugzilla/show_bug.cgi?id=1594
>
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
>
Comment 5 Prasanna S Panchamukhi 2005-11-15 13:04:47 UTC
                                                                               
                                                                               
                                                                               
                                  Below is the stack trace when I run the given
script. Also I get a double fault
 as seen in the bug #1776.
                                                                               
                                                          
wks126319wss.in.ibm.com login: double fault, gdt at c04ea000 [255 bytes]
Kernel panic - not syncing: kernel/sched.c:357:
spin_lock(kernel/sched.c:c04eac40) already locked by kernel/sched.c/357. (Not
tainted)
                                                                               
                                                          
 [<c0127ba8>] panic+0x45/0x1b4
 [<c01207fa>] wake_up_process+0x0/0x10
 [<c01519d7>] autoremove_wake_function+0x15/0x37
 [<c012186b>] __wake_up_common+0x39/0x59
 [<c012191d>] __wake_up+0x92/0x252
 [<c0128c93>] call_console_drivers+0x7e/0x149
 [<c01298c3>] release_console_sem+0x283/0x41c
 [<c01292b0>] vprintk+0x401/0x70e
 [<c0128eab>] printk+0x1b/0x1f
 [<c010dc16>] doublefault_fn+0x36/0xf0
 <0>Kernel panic - not syncing: kernel/sched.c:3063:
spin_lock(kernel/printk.c:c0458d00) already locked by kernel/sched.c/3063. (Not
tain)                                                                          
                                                               
 [<c0127ba8>] panic+0x45/0x1b4
 [<c0271f16>] vgacon_dummy+0x0/0xa
 [<c0121add>] __wake_up_locked+0x0/0x21
 [<c01298c3>] release_console_sem+0x283/0x41c
 [<c01292b0>] vprintk+0x401/0x70e
 [<c01292b0>] vprintk+0x401/0x70e
 [<c025ce8a>] vsnprintf+0x32c/0x624
                                                                               
                                                          
                                                                               
                                                          
I get the following trace when I run the simple script.
Not sure what code added by the transilator is causing the problem.
                                                                               
                                                          
probe kernel.function("sched_clock")
{
}
                                                                               
                                                          
login: double fault, gdt at c04ea000 [255 bytes]
Kernel panic - not syncing: kernel/sched.c:357:
spin_lock(kernel/sched.c:c04eac40) already locked by kernel/sched.c/357. (Not
tainted)

-Prasanna
Comment 6 Martin Hunt 2005-11-30 11:15:29 UTC
Why is this marked as a "kprobes" bug.  It's just another dup of 1564.

*** This bug has been marked as a duplicate of 1564 ***