This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: runtime libraries application on schedule()


Martin Hunt wrote:
Use a jprobe to instrument the entry to schedule().
Record the pid and the return address.


pid is easy.  I don't know of a way to get the return address.
I assume some inline assembly to get ebp or rbp would do it.  Something
else to add to the runtime library.

In the case of normal calls gcc has a mechanism to get the return address, __builtin_return_address(0). With the kprobe on the first instruction of a function, for x86 it would be the location on the stack ( *(regs.esp)), and for ppc it would be stored in the link register (regs.link).


The information could be analyzed in a sevral different ways:

1) sum based on return address to find out which schedule points are getting hit all the time
2) sum the counts by pid to figure out which pids are getting rescheduled a lot


That's easy enough.

asmlinkage void __sched inst_schedule(void)
{
  _stp_map_key_long (schedpid, current->pid);
  _stp_map_set_int64 (schedpid, _stp_map_get_int64(schedstr) + 1);

  jprobe_return();
}

I'll attach full source to a working probe.


3) find out pid hitting which schedule() points
4) find out schedule() point being encountered by which process


I don't understand what you mean by these. What is a schedule point?

Assuming that the kernel isn't compiled with preemption, a thread in the kernel runs until it yields the processor over to another thread. This occurs when schedule() is called. Usually this is done when there is nothing more to do on that thread, e.g. waiting for some resource to be available or an operation to complete. Search for "schedule()" in the kernel code and you can find many examples in the kernel sources.


Having the pid and address that schedul was called will give some indication which processes and which areas of code are waiting for resources to become available.

3 and 4 where just different ways of reducing/analyzing the data. Imagine a 3D plot. X axis pid, y axis return address, and z axis the number of times that combination is recorded. 3 would be look at a particular y axis (schedule point location) and find all the counts for it (pid). 4 would be look at a particular x axis (pid) and look at all the schedule points it triggers. Could sum counts to find out which schedule call locations are used most frequently (hint whether there is a problem with a device driver).

For the kernel itself it should be fairly easy to map the addresses back to the appropriate locations. However, would like to be able to do the same for modules. Many of the device drivers uses schedule and they may be loaded at various locations.


Martin




------------------------------------------------------------------------

#define HASH_TABLE_BITS 8
#define HASH_TABLE_SIZE (1<<HASH_TABLE_BITS)
#define BUCKETS 16 /* largest histogram width */
#include "../../runtime.h"

#include "../../io.c"
#include "../../map.c"


MODULE_PARM_DESC(stp, "\n");


MAP schedpid, schedstr;

asmlinkage void __sched inst_schedule(void)
{
  _stp_map_key_str (schedstr, current->comm);
  _stp_map_key_long (schedpid, current->pid);
  _stp_map_set_int64 (schedstr, _stp_map_get_int64(schedpid) + 1);
  _stp_map_set_int64 (schedpid, _stp_map_get_int64(schedstr) + 1);

  jprobe_return();
  return;
}

static struct jprobe stp_probes[] = {
  {
    .kp.addr = (kprobe_opcode_t *)0xc0309408,
    .entry = (kprobe_opcode_t *) inst_schedule
  },
};

#define MAX_STP_ROUTINE (sizeof(stp_probes)/sizeof(struct jprobe))

static int init_stp(void)
{
int i;
schedpid = _stp_map_new (10000, INT64);
schedstr = _stp_map_new (10000, INT64);


  for (i = 0; i < MAX_STP_ROUTINE; i++) {
    dlog("plant jprobe at %p, handler addr %p\n",
	   stp_probes[i].kp.addr, stp_probes[i].entry);
    register_jprobe(&stp_probes[i]);
  }
  dlog("instrumentation is enabled...\n");
  return 0;
}

static void cleanup_stp(void)
{
  int i;
  struct map_node_int64 *ptr;

  for (i = 0; i < MAX_STP_ROUTINE; i++)
    unregister_jprobe(&stp_probes[i]);

foreach (schedpid, ptr)
dlog ("pid %ld = %lld\n", key1int(ptr), ptr->val); dlog ("\n");


foreach (schedstr, ptr)
dlog ("process %s = %lld\n", key1str(ptr), ptr->val); dlog ("\n");



_stp_map_del (schedpid); _stp_map_del (schedstr);

  dlog("EXIT\n");
}

module_init(init_stp);
module_exit(cleanup_stp);
MODULE_LICENSE("GPL");



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]