I have taken a look at the Linux Kernel State Tracer (LKST) and Linux Trace Toolkit (LTT) to see how what information is recorded and where the probes are placed. When doing the context switch the probes probes record prev pid, next pid, and state of prev. The probe is placed right before the call to context_switch() called in kernel/sched.c:__sched() (LTT) or in context_switch() itself (LKST). Would like to avoid using line numbers and just places the probe on the entry to context_switch(). However, context_switch is an inline function and it appears that the current stap has difficulty finding the inline function. -Will
For instrumentation of context switch to work for inline functions like context_switch bz 1155 needs to resolved.
Created attachment 668 [details] proposal code to track context switches. The attached systemtap script does not work as-is. It just gives an outline of what the expected probe would look like.
Created attachment 725 [details] Revised schedule instrumentation Problems obtaining the inline function arguments. Not sure this is do to optimizations in the compiler, problems with the debug information, or the translator reading the debug information.
What other areas than context_switch provide useful information about the scheduler? I would expect that any statistics that the kernel tracks about CPU consumption, SMP balancing, run-queues would be appropriate for exposure in a scheduler tapset.
Created attachment 727 [details] proposed code to instrument context switches The commented out "log" operation in "probe context" can cause the machine to crash when commented out.
It appears that having a log operation in the core of the sched function can cause the kernel to crash. Below is the output from running the 2005-10-26 20:33 version of the sched.stp with the log operation in "probe switch_context" executing: Kernel panic - not syncing: kernel/sched.c:357: spin_lock(kernel/sched.c:c04eac40) already locked by kernel/sched.c/2802. (Not tainted) [<c0127ba8>] panic+0x45/0x1b4 [<c01207fa>] wake_up_process+0x0/0x10 [<c01519d7>] autoremove_wake_function+0x15/0x37 [<c012186b>] __wake_up_common+0x39/0x59 [<c012191d>] __wake_up+0x92/0x252 [<d0adc431>] _stp_write+0x382/0xa72 [stap_2228] [<d0adf848>] probe_0+0x4d7/0x5f5 [stap_2228] [<c010189a>] __switch_to+0x8/0x1af [<d0addd31>] _stp_print_flush+0x17/0x35 [stap_2228] [<d0adfa57>] dwarf_kprobe_0_enter+0x92/0xd0 [stap_2228] [<c010189a>] __switch_to+0x8/0x1af [<c011c52e>] kprobe_handler+0xc4/0x24a [<c011c839>] kprobe_exceptions_notify+0x33/0x122 [<c0143009>] notifier_call_chain+0x17/0x27 [<c0106495>] do_int3+0x3c/0x7d [<c0104826>] int3+0x1e/0x24 [<c010189b>] __switch_to+0x9/0x1af [<c03ff4a6>] schedule+0x356/0x938 [<c010ca84>] old_mmap+0xce/0x114 [<c0106948>] do_IRQ+0x51/0x82 [<c01044d2>] work_resched+0x5/0x16 <0>Kernel panic - not syncing: kernel/sched.c:357: spin_lock(kernel/sched.c:c04eac40) already locked by kernel/sched.c/2802. (Not tainted) [<c0127ba8>] panic+0x45/0x1b4 [<c01207fa>] wake_up_process+0x0/0x10 [<c0170778>] __alloc_pages+0xe7/0x3ff [<c01519d7>] autoremove_wake_function+0x15/0x37 [<c012186b>] __wake_up_common+0x39/0x59 [<c012191d>] __wake_up+0x92/0x252 [<c02e5ac9>] serial8250_console_write+0x0/0x1ec [<c01298c3>] release_console_sem+0x283/0x41c [<c01292b0>] vprintk+0x401/0x70e [<c01044d2>] work_resched+0x5/0x16 [<c02c1172>] do_unblank_screen+0x1b/0x142 [<c0128eab>] printk+0x1b/0x1f [<c011df92>] bust_spinlocks+0x42/0x4d [<c0127baf>] panic+0x4c/0x1b4 [<c01207fa>] wake_up_process+0x0/0x10 [<c01519d7>] autoremove_wake_function+0x15/0x37 [<c012186b>] __wake_up_common+0x39/0x59 [<c012191d>] __wake_up+0x92/0x252 [<d0adc431>] _stp_write+0x382/0xa72 [stap_2228] [<d0adf848>] probe_0+0x4d7/0x5f5 [stap_2228] [<c010189a>] __switch_to+0x8/0x1af [<d0addd31>] _stp_print_flush+0x17/0x35 [stap_2228] [<d0adfa57>] dwarf_kprobe_0_enter+0x92/0xd0 [stap_2228] [<c010189a>] __switch_to+0x8/0x1af [<c011c52e>] kprobe_handler+0xc4/0x24a [<c011c839>] kprobe_exceptions_notify+0x33/0x122 [<c0143009>] notifier_call_chain+0x17/0x27 [<c0106495>] do_int3+0x3c/0x7d [<c0104826>] int3+0x1e/0x24 [<c010189b>] __switch_to+0x9/0x1af [<c03ff4a6>] schedule+0x356/0x938 [<c010ca84>] old_mmap+0xce/0x114 [<c0106948>] do_IRQ+0x51/0x82 [<c01044d2>] work_resched+0x5/0x16 <0>Kernel panic - not syncing: kernel/sched.c:3063: spin_lock(kernel/printk.c:c0458d00) already locked by kernel/sched.c/3063. (Not tainted) [<c0127ba8>] panic+0x45/0x1b4 [<c02e5ac9>] serial8250_console_write+0x0/0x1ec [<c0121add>] __wake_up_locked+0x0/0x21 [<c01298c3>] release_console_sem+0x283/0x41c [<c01292b0>] vprintk+0x401/0x70e [<c01044d2>] work_resched+0x5/0x16 [<c02c1172>] do_unblank_screen+0x1b/0x142 [<c0128eab>] printk+0x1
There is a scheduler tapset in the tree now that does context switches and more -- does that satisfy the requirement for this bug?
ancient bug cleanup; made redundant by subsequent work by IBM and Intel.