the following script causes the system to oops recursively or perhaps it never disables the probe when the oops is generated. versions: [jamesd@localhost ~]$ stap -V SystemTap translator/driver (version 0.4.1 built 2005-09-22) Copyright (C) 2005 Red Hat, Inc. and others This is free software; see the source for copying conditions. [jamesd@localhost ~]$ Linux localhost.localdomain 2.6.12-1.1447_FC4 #1 Fri Aug 26 20:29:51 EDT 2005 i686 athlon i386 GNU/Linux Linux localhost.localdomain 2.6.12-1.1447_FC4 #1 Fri Aug 26 20:29:51 EDT 2005 i686 athlon i386 GNU/Linux screen shot of part of the oops is at http://www.blastwave.org/~jamesd/systemtap/oops2.PNG script: #! stap probe kernel.function("sched_clock") { called++; } probe end { print("sched_clock called " . string(called) . " times.\n"); } probe timer.jiffies(100) { exit(); }
Prasanna, can you look into this bug.
Note that the variable "called" is not declared global.
I am seeing different behaviours when a kernel module is loaded to insert probes on sched_clock() with calling printks from the handlers. On i386 uniprocessor box, I see lots of oops messages. But on i386 smp box, probes handlers are executed and I can see messages on the console, the system is not hung, but it does not allow me to remove the loaded kernel module as well. If no printks are used in the handlers, it runs fine on both uni and smp box. One solution would be to prevent calling of printks from probe handlers of sched_clock(). Another solution would be to avoid probes on sched_clock(). Also there are situations where inserted probes cannot be removed from the command line, in such situations we need to provide a SysRq key support to remove all the probes from the kernel. -Prasanna
Subject: Re: simple script oops box recursively On 11 Nov 2005 14:38:43 -0000, prasanna at in dot ibm dot com <sourceware-bugzilla@sourceware.org> wrote: > > ------- Additional Comments From prasanna at in dot ibm dot com 2005-11-11 14:38 ------- > I am seeing different behaviours when a kernel module is loaded to insert probes > on sched_clock() with calling printks from the handlers. On i386 uniprocessor > box, I see lots of oops messages. But on i386 smp box, probes handlers are > executed and I can see messages on the console, the system is not hung, but it > does not allow me to remove the loaded kernel module as well. > > If no printks are used in the handlers, it runs fine on both uni and smp box. > One solution would be to prevent calling of printks from probe handlers of > sched_clock(). Another solution would be to avoid probes on sched_clock(). please see my comments in http://sourceware.org/bugzilla/show_bug.cgi?id=1776 about printk requirements. > > Also there are situations where inserted probes cannot be removed from the > command line, in such situations we need to provide a SysRq key support to > remove all the probes from the kernel. > This isn't a solution, if you ever wish systemtap to be useful on any System other than a single developers box. Most production boxes don't have a keyboard attached. Are you going to add a quick hack to ssh that sends the SysRq key to a task? You need to solve the real problem rather than just adding another quick hack. Solve the problem, rather than hiding/removing the symptoms. James Dickens uadmin.blogspot.com > -Prasanna > > -- > > > http://sourceware.org/bugzilla/show_bug.cgi?id=1594 > > ------- You are receiving this mail because: ------- > You reported the bug, or are watching the reporter. >
Below is the stack trace when I run the given script. Also I get a double fault as seen in the bug #1776. wks126319wss.in.ibm.com login: double fault, gdt at c04ea000 [255 bytes] Kernel panic - not syncing: kernel/sched.c:357: spin_lock(kernel/sched.c:c04eac40) already locked by kernel/sched.c/357. (Not tainted) [<c0127ba8>] panic+0x45/0x1b4 [<c01207fa>] wake_up_process+0x0/0x10 [<c01519d7>] autoremove_wake_function+0x15/0x37 [<c012186b>] __wake_up_common+0x39/0x59 [<c012191d>] __wake_up+0x92/0x252 [<c0128c93>] call_console_drivers+0x7e/0x149 [<c01298c3>] release_console_sem+0x283/0x41c [<c01292b0>] vprintk+0x401/0x70e [<c0128eab>] printk+0x1b/0x1f [<c010dc16>] doublefault_fn+0x36/0xf0 <0>Kernel panic - not syncing: kernel/sched.c:3063: spin_lock(kernel/printk.c:c0458d00) already locked by kernel/sched.c/3063. (Not tain) [<c0127ba8>] panic+0x45/0x1b4 [<c0271f16>] vgacon_dummy+0x0/0xa [<c0121add>] __wake_up_locked+0x0/0x21 [<c01298c3>] release_console_sem+0x283/0x41c [<c01292b0>] vprintk+0x401/0x70e [<c01292b0>] vprintk+0x401/0x70e [<c025ce8a>] vsnprintf+0x32c/0x624 I get the following trace when I run the simple script. Not sure what code added by the transilator is causing the problem. probe kernel.function("sched_clock") { } login: double fault, gdt at c04ea000 [255 bytes] Kernel panic - not syncing: kernel/sched.c:357: spin_lock(kernel/sched.c:c04eac40) already locked by kernel/sched.c/357. (Not tainted) -Prasanna
Why is this marked as a "kprobes" bug. It's just another dup of 1564. *** This bug has been marked as a duplicate of 1564 ***