This is the mail archive of the systemtap@sources.redhat.com mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[KPROBE][RFC] Tweak to the function return probe design


>From my experiences with adding return probes to x86_64 and ia64, and the
feedback on LKML to those patches, I think we can simplify the design
for return probes.

The following patch tweaks the original design such that:

* Instead of storing the stack address in the return probe instance, the
  task pointer is stored.  This gives us all we need in order to:
    - find the correct return probe instance when we enter the trampoline
      (even if we are recursing)
    - find all left-over return probe instances when the task is going away

  This has the side effect of simplifying the implementation since more
  work can be done in kernel/kprobes.c since architecture specific knowlege
  of the stack layout is no longer required.  Specifically, we no longer have:
	- arch_get_kprobe_task()
	- arch_kprobe_flush_task()
	- get_rp_inst_tsk()
	- get_rp_inst()
	- trampoline_post_handler() <see next bullet>

* Instead of splitting the return probe handling and cleanup logic across
  the pre and post trampoline handlers, all the work is pushed into the 
  pre function (trampoline_probe_handler), and then we skip single stepping
  the original function.  In this case the original instruction to be single
  stepped was just a NOP, and we can do without the extra interruption.

The new flow of events to having a return probe handler execute when a target
function exits is:

* At system initialization time, a kprobe is inserted at the beginning of
  kretprobe_trampoline (this has not changed)

* register_kretprobe() will insert a kprobe at the beginning of the targeted
  function with the kprobe pre_handler set to arch_prepare_kretprobe
  (still no change)

* When the target function is entered, the kprobe is fired, calling
  arch_prepare_kretprobe (still no change)

* In arch_prepare_kretprobe() we try to get a free instance and if one is
  available then we fill out the instance with a pointer to the return probe,
  the original return address, and a pointer to the task structure (instead
  of the stack address.)  Just like before we change the return address
  to the trampoline function and mark the instance as used.

  If multiple return probes are registered for a given target function,
  then arch_prepare_kretprobe() will get called multiple times for the same 
  task (since our kprobe implementation is able to handle multiple kprobes 
  at the same address.)  Past the first call to arch_prepare_kretprobe, 
  we end up with the original address stored in the return probe instance
  pointing to our trampoline function. (This is a significant difference
  from the original arch_prepare_kretprobe design.) 

* Target function executes like normal and then returns to kretprobe_trampoline.

* kprobe inserted on the first instruction of kretprobe_trampoline is fired
  and calls trampoline_probe_handler() (no change here)

* trampoline_probe_handler() looks up the newest return probe instance 
  associated with the current task and then:
  	- calls the registered handler function
	- sets the pt_regs instruction pointer back to the original 
          return address
	- marks the instance as free
	- returns in a way that the single stepping of the original
          instruction is skipped

   If there were multiple kretprobes registered for the target function, 
   then the original return address would be trampoline_probe_handler, causing
   the next instance to be handled until finally the real return address
   is restored.

   (Huge change)
       
* If the task is killed with some left-over return probe instances (meaning
  that a target function was entered, but never returned), then we just
  free any instances associated with the task.  (Not much different other
  then we can handle this without calling architecture specific functions.)

  BUT... I just mark each of the instances as unused without bothering to 
  restore their original return address.  The original implementation would 
  restore the return address for each instance (I assume because it was 
  thought that the target function could still be running and could still 
  return.)

  On i386 we do this cleanup from process.c:exit_thread() and 
  process.c:flush_thread().  Let me know if I am wrong, but I do not think
  at this point in the task lifecycle that it is possible for the the
  target functions to continue after this point and get into trouble
  because of a wrong return address. 

  (Significant change)

This patch applies to the 2.6.12-rc6-mm1 kernel, but only for the i386
architecture.  I know this approach will work on x86_64 and ia64, and I
haven't the slightest clue if this is ok for ppc64.

    --rusty

 arch/i386/kernel/kprobes.c |  107 +++++++++++++++++----------------------------
 include/linux/kprobes.h    |   13 -----
 kernel/kprobes.c           |   59 ++++++------------------
 3 files changed, 59 insertions(+), 120 deletions(-)

Index: linux-2.6.12-rc6-mm1/arch/i386/kernel/kprobes.c
===================================================================
--- linux-2.6.12-rc6-mm1.orig/arch/i386/kernel/kprobes.c
+++ linux-2.6.12-rc6-mm1/arch/i386/kernel/kprobes.c
@@ -127,48 +127,23 @@ static inline void prepare_singlestep(st
 		regs->eip = (unsigned long)&p->ainsn.insn;
 }
 
-struct task_struct  *arch_get_kprobe_task(void *ptr)
-{
-	return ((struct thread_info *) (((unsigned long) ptr) &
-					(~(THREAD_SIZE -1))))->task;
-}
-
 void arch_prepare_kretprobe(struct kretprobe *rp, struct pt_regs *regs)
 {
 	unsigned long *sara = (unsigned long *)&regs->esp;
-	struct kretprobe_instance *ri;
-	static void *orig_ret_addr;
+        struct kretprobe_instance *ri;
 
-	/*
-	 * Save the return address when the return probe hits
-	 * the first time, and use it to populate the (krprobe
-	 * instance)->ret_addr for subsequent return probes at
-	 * the same addrress since stack address would have
-	 * the kretprobe_trampoline by then.
-	 */
-	if (((void*) *sara) != kretprobe_trampoline)
-		orig_ret_addr = (void*) *sara;
+        if ((ri = get_free_rp_inst(rp)) != NULL) {
+                ri->rp = rp;
+                ri->task = current;
+		ri->ret_addr = (unsigned long *) *sara;
 
-	if ((ri = get_free_rp_inst(rp)) != NULL) {
-		ri->rp = rp;
-		ri->stack_addr = sara;
-		ri->ret_addr = orig_ret_addr;
-		add_rp_inst(ri);
 		/* Replace the return addr with trampoline addr */
 		*sara = (unsigned long) &kretprobe_trampoline;
-	} else {
-		rp->nmissed++;
-	}
-}
 
-void arch_kprobe_flush_task(struct task_struct *tk)
-{
-	struct kretprobe_instance *ri;
-	while ((ri = get_rp_inst_tsk(tk)) != NULL) {
-		*((unsigned long *)(ri->stack_addr)) =
-					(unsigned long) ri->ret_addr;
-		recycle_rp_inst(ri);
-	}
+                add_rp_inst(ri);
+        } else {
+                rp->nmissed++;
+        }
 }
 
 /*
@@ -286,36 +261,37 @@ no_kprobe:
  */
 int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs)
 {
-	struct task_struct *tsk;
-	struct kretprobe_instance *ri;
-	struct hlist_head *head;
-	struct hlist_node *node;
-	unsigned long *sara = ((unsigned long *) &regs->esp) - 1;
-
-	tsk = arch_get_kprobe_task(sara);
-	head = kretprobe_inst_table_head(tsk);
-
-	hlist_for_each_entry(ri, node, head, hlist) {
-		if (ri->stack_addr == sara && ri->rp) {
-			if (ri->rp->handler)
-				ri->rp->handler(ri, regs);
-		}
-	}
-	return 0;
-}
-
-void trampoline_post_handler(struct kprobe *p, struct pt_regs *regs,
-						unsigned long flags)
-{
-	struct kretprobe_instance *ri;
-	/* RA already popped */
-	unsigned long *sara = ((unsigned long *)&regs->esp) - 1;
-
-	while ((ri = get_rp_inst(sara))) {
-		regs->eip = (unsigned long)ri->ret_addr;
-		recycle_rp_inst(ri);
-	}
-	regs->eflags &= ~TF_MASK;
+        struct kretprobe_instance *ri = NULL;
+        struct hlist_head *head;
+        struct hlist_node *node;
+
+        head = kretprobe_inst_table_head(current);
+
+        /*
+         * The first instance associated with the task is the instance
+         * we need for this function return
+         */
+        hlist_for_each_entry(ri, node, head, hlist)
+                if (ri->task == current)
+                        break;
+
+        BUG_ON(!ri);
+
+        if (ri->rp && ri->rp->handler)
+                ri->rp->handler(ri, regs);
+
+	regs->eip = (unsigned long)ri->ret_addr;
+        recycle_rp_inst(ri);
+
+        unlock_kprobes();
+        preempt_enable_no_resched();
+
+        /*
+         * By returning a non-zero value, we are telling
+         * kprobe_handler() that we have handled unlocking
+         * and re-enabling preemption.
+         */
+        return 1;
 }
 
 /*
@@ -403,8 +379,7 @@ static inline int post_kprobe_handler(st
 		current_kprobe->post_handler(current_kprobe, regs, 0);
 	}
 
-	if (current_kprobe->post_handler != trampoline_post_handler)
-		resume_execution(current_kprobe, regs);
+	resume_execution(current_kprobe, regs);
 	regs->eflags |= kprobe_saved_eflags;
 
 	/*Restore back the original saved kprobes variables and continue. */
Index: linux-2.6.12-rc6-mm1/include/linux/kprobes.h
===================================================================
--- linux-2.6.12-rc6-mm1.orig/include/linux/kprobes.h
+++ linux-2.6.12-rc6-mm1/include/linux/kprobes.h
@@ -105,9 +105,6 @@ struct jprobe {
 
 #ifdef ARCH_SUPPORTS_KRETPROBES
 extern int trampoline_probe_handler(struct kprobe *p, struct pt_regs *regs);
-extern void trampoline_post_handler(struct kprobe *p, struct pt_regs *regs,
-							unsigned long flags);
-extern struct task_struct *arch_get_kprobe_task(void *ptr);
 extern void arch_prepare_kretprobe(struct kretprobe *rp, struct pt_regs *regs);
 extern void arch_kprobe_flush_task(struct task_struct *tk);
 #else /* ARCH_SUPPORTS_KRETPROBES */
@@ -119,10 +116,6 @@ static inline int trampoline_probe_handl
 {
 	return 0;
 }
-static inline void trampoline_post_handler(struct kprobe *p,
-				struct pt_regs *regs, unsigned long flags)
-{
-}
 static inline void arch_prepare_kretprobe(struct kretprobe *rp,
 					struct pt_regs *regs)
 {
@@ -155,8 +148,8 @@ struct kretprobe_instance {
 	struct hlist_node uflist; /* either on free list or used list */
 	struct hlist_node hlist;
 	struct kretprobe *rp;
-	void *ret_addr;
-	void *stack_addr;
+	unsigned long *ret_addr;
+	struct task_struct *task;
 };
 
 #ifdef CONFIG_KPROBES
@@ -194,8 +187,6 @@ int register_kretprobe(struct kretprobe 
 void unregister_kretprobe(struct kretprobe *rp);
 
 struct kretprobe_instance *get_free_rp_inst(struct kretprobe *rp);
-struct kretprobe_instance *get_rp_inst(void *sara);
-struct kretprobe_instance *get_rp_inst_tsk(struct task_struct *tk);
 void add_rp_inst(struct kretprobe_instance *ri);
 void kprobe_flush_task(struct task_struct *tk);
 void recycle_rp_inst(struct kretprobe_instance *ri);
Index: linux-2.6.12-rc6-mm1/kernel/kprobes.c
===================================================================
--- linux-2.6.12-rc6-mm1.orig/kernel/kprobes.c
+++ linux-2.6.12-rc6-mm1/kernel/kprobes.c
@@ -140,8 +140,7 @@ static int aggr_break_handler(struct kpr
 
 struct kprobe trampoline_p = {
 		.addr = (kprobe_opcode_t *) &kretprobe_trampoline,
-		.pre_handler = trampoline_probe_handler,
-		.post_handler = trampoline_post_handler
+		.pre_handler = trampoline_probe_handler
 };
 
 struct kretprobe_instance *get_free_rp_inst(struct kretprobe *rp)
@@ -162,35 +161,18 @@ struct kretprobe_instance *get_used_rp_i
 	return NULL;
 }
 
-struct kretprobe_instance *get_rp_inst(void *sara)
-{
-	struct hlist_head *head;
-	struct hlist_node *node;
-	struct task_struct *tsk;
-	struct kretprobe_instance *ri;
-
-	tsk = arch_get_kprobe_task(sara);
-	head = &kretprobe_inst_table[hash_ptr(tsk, KPROBE_HASH_BITS)];
-	hlist_for_each_entry(ri, node, head, hlist) {
-		if (ri->stack_addr == sara)
-			return ri;
-	}
-	return NULL;
-}
-
 void add_rp_inst(struct kretprobe_instance *ri)
 {
-	struct task_struct *tsk;
 	/*
 	 * Remove rp inst off the free list -
 	 * Add it back when probed function returns
 	 */
 	hlist_del(&ri->uflist);
-	tsk = arch_get_kprobe_task(ri->stack_addr);
+
 	/* Add rp inst onto table */
 	INIT_HLIST_NODE(&ri->hlist);
 	hlist_add_head(&ri->hlist,
-			&kretprobe_inst_table[hash_ptr(tsk, KPROBE_HASH_BITS)]);
+			&kretprobe_inst_table[hash_ptr(ri->task, KPROBE_HASH_BITS)]);
 
 	/* Also add this rp inst to the used list. */
 	INIT_HLIST_NODE(&ri->uflist);
@@ -217,34 +199,25 @@ struct hlist_head * kretprobe_inst_table
 	return &kretprobe_inst_table[hash_ptr(tsk, KPROBE_HASH_BITS)];
 }
 
-struct kretprobe_instance *get_rp_inst_tsk(struct task_struct *tk)
-{
-	struct task_struct *tsk;
-	struct hlist_head *head;
-	struct hlist_node *node;
-	struct kretprobe_instance *ri;
-
-	head = &kretprobe_inst_table[hash_ptr(tk, KPROBE_HASH_BITS)];
-
-	hlist_for_each_entry(ri, node, head, hlist) {
-		tsk = arch_get_kprobe_task(ri->stack_addr);
-		if (tsk == tk)
-			return ri;
-	}
-	return NULL;
-}
-
 /*
- * This function is called from do_exit or do_execv when task tk's stack is
- * about to be recycled. Recycle any function-return probe instances
- * associated with this task. These represent probed functions that have
- * been called but may never return.
+ * This function is called from exit_thread or flush_thread when task tk's
+ * stack is being recycled so that we can recycle any function-return probe
+ * instances associated with this task. These left over instances represent
+ * probed functions that have been called but will never return.
  */
 void kprobe_flush_task(struct task_struct *tk)
 {
+        struct kretprobe_instance *ri;
+        struct hlist_head *head;
+	struct hlist_node *node, *tmp;
 	unsigned long flags = 0;
+
 	spin_lock_irqsave(&kprobe_lock, flags);
-	arch_kprobe_flush_task(tk);
+        head = kretprobe_inst_table_head(current);
+        hlist_for_each_entry_safe(ri, node, tmp, head, hlist) {
+                if (ri->task == tk)
+                        recycle_rp_inst(ri);
+        }
 	spin_unlock_irqrestore(&kprobe_lock, flags);
 }
 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]