This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

updated uprobes patches

From: Jim Keniston <jkenisto at us dot ibm dot com>
To: systemtap <systemtap at sources dot redhat dot com>
Date: Fri, 25 May 2007 16:28:49 -0700
Subject: updated uprobes patches

Here are updated uprobes patches to replace those I posted
May 16.  Utrace is still not in the -mm kernel, so these
patches are against the most recent utrace-enabled -mm
kernel, 2.6.21-rc6-mm1.  If you want to use a more recent
kernel, you'll need to patch in linux-2.6-utrace.patch from
http://people.redhat.com/roland/utrace/2.6-current/ first.

I believe that these uprobes patches address the remainder of
the concerns that Ernie Petrides reported earlier this month.
In particular, struct uprobe and linux/uprobes.h have been reorganized.
Struct uprobe_kimg is renamed uprobe_probept, and the new uprobe_kimg
holds some of what used to reside in struct uprobe.  The API is the
same: [un]register_u[ret]probe().

In linux/uprobe.h, most of the data-structure definitions (and
associated #includes) are hidden behind #ifdef UPROBES_IMPLEMENTATION.
So it's possible that your uprobes modules will need to add #includes
that previously happened to be provided by uprobes.h.

Comments welcome, as always.

Jim Keniston

Uprobes supplements utrace and kprobes, enabling a kernel module to
probe user-space applications in much the same way that a kprobes-based
module probes the kernel.

Uprobes enables you to dynamically break into any routine in a
user application and collect debugging and performance information
non-disruptively. You can trap at any code address, specifying a
kernel handler routine to be invoked when the breakpoint is hit.

Uprobes is layered on top of utrace.

The registration function, register_uprobe(), specifies which process
is to be probed, where the probe is to be inserted, and what handler is
to be called when the probe is hit. Refer to Documentation/uprobes.txt
in this patch for usage examples.

Salient points:

o Like a debugger, uprobes uses a breakpoint instruction to break into
program execution. Through utrace's signal callback, uprobes recognizes
a probe hit and runs the user-specified handler. The handler may sleep.

o Breakpoint insertion is via access_process_vm() and hence is
copy-on-write and per-process.

o As uprobes uses utrace, a unique engine exists for every thread of a
probed process.  Any newly created thread inherits all the probes and
gets an engine of its own.  Upon thread exit, the engine is detached.

o Currently, uprobes aren't inherited across fork()s.

o A probe registration or ungregistration operation may sleep.
Using utrace, uprobes quiesces all threads in the probed process
before inserting or removing the breakpoint instruction.

---

 Documentation/uprobes.txt  |  425 +++++++++++++
 arch/i386/Kconfig          |   10 
 include/asm-i386/uprobes.h |   54 +
 include/linux/uprobes.h    |  253 ++++++++
 kernel/Makefile            |    1 
 kernel/uprobes.c           | 1390 +++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 2132 insertions(+)

diff -puN /dev/null Documentation/uprobes.txt
--- /dev/null	2007-05-22 10:49:31.544575860 -0700
+++ linux-2.6.21-rc6-jimk/Documentation/uprobes.txt	2007-05-17 14:06:17.000000000 -0700
@@ -0,0 +1,425 @@
+Title	: User-Space Probes (Uprobes)
+Author	: Jim Keniston <jkenisto@us.ibm.com>
+
+CONTENTS
+
+1. Concepts
+2. Architectures Supported
+3. Configuring Uprobes
+4. API Reference
+5. Uprobes Features and Limitations
+6. Interoperation with Kprobes
+7. Interoperation with Utrace
+8. Probe Overhead
+9. TODO
+10. Uprobes Team
+11. Uprobes Example
+
+1. Concepts
+
+Uprobes enables you to dynamically break into any routine in a
+user application and collect debugging and performance information
+non-disruptively. You can trap at any code address, specifying a
+kernel handler routine to be invoked when the breakpoint is hit.
+
+The registration function, register_uprobe(), specifies which
+process is to be probed, where the probe is to be inserted, and what
+handler is to be called when the probe is hit.
+
+Typically, Uprobes-based instrumentation is packaged as a kernel
+module.  In the simplest case, the module's init function installs
+("registers") one or more probes, and the exit function unregisters
+them.  However, probes can be registered or unregistered in response to
+other events as well.  For example, you can establish Utrace callbacks
+to register and/or unregister probes when a particular process forks,
+clones a thread, execs, enters a system call, receives a signal,
+exits, etc.  See Documentation/utrace.txt.
+
+1.1 How Does a Uprobe Work?
+
+When a uprobe is registered, Uprobes makes a copy of the probed
+instruction, stops the probed application, replaces the first byte(s)
+of the probed instruction with a breakpoint instruction (e.g., int3
+on i386 and x86_64), and allows the probed application to continue.
+(When inserting the breakpoint, Uprobes uses the same copy-on-write
+mechanism that ptrace uses, so that the breakpoint affects only that
+process, and not any other process running that program.  This is
+true even if the probed instruction is in a shared library.)
+
+When a CPU hits the breakpoint instruction, a trap occurs, the CPU's
+user-mode registers are saved, and a SIGTRAP signal is generated.
+Uprobes intercepts the SIGTRAP and finds the associated uprobe.
+It then executes the handler associated with the uprobe, passing the
+handler the addresses of the uprobe struct and the saved registers.
+The handler may block, but keep in mind that the probed thread remains
+stopped while your handler runs.
+
+Next, Uprobes single-steps the probed instruction and resumes execution
+of the probed process at the instruction following the probepoint.
+[Note: In the base uprobes patch, we temporarily remove the breakpoint
+instruction, insert the original opcode, single-step the instruction
+"inline", and then replace the breakpoint.  This can create problems
+in a multithreaded application.  For example, it opens a time window
+during which another thread can sail right past the probepoint.
+This problem is resolved in the "single-stepping out of line" patch.]
+
+1.2 The Role of Utrace
+
+When a probe is registered on a previously unprobed process,
+Uprobes establishes a tracing "engine" with Utrace (see
+Documentation/utrace.txt) for each thread (task) in the process.
+Uprobes uses the Utrace "quiesce" mechanism to stop all the threads
+prior to insertion or removal of a breakpoint.  Utrace also notifies
+Uprobes of breakpoint and single-step traps and of other interesting
+events in the lifetime of the probed process, such as fork, clone,
+exec, and exit.
+
+1.3 Multithreaded Applications
+
+Uprobes supports the probing of multithreaded applications.  Uprobes
+imposes no limit on the number of threads in a probed application.
+All threads in a process use the same text pages, so every probe
+in a process affects all threads; of course, each thread hits the
+probepoint (and runs the handler) independently.  Multiple threads
+may run the same handler simultaneously.  If you want a particular
+thread or set of threads to run a particular handler, your handler
+should check current or current->pid to determine which thread has
+hit the probepoint.
+
+When a process clones a new thread, that thread automatically shares
+all current and future probes established for that process.
+
+Keep in mind that when you register or unregister a probe, the
+breakpoint is not inserted or removed until Utrace has stopped all
+threads in the process.  The register/unregister function returns
+after the breakpoint has been inserted/removed.
+
+2. Architectures Supported
+
+Uprobes is implemented on the following architectures:
+
+- i386
+- x86_64 (AMD-64, EM64T)	// in progress
+- ppc64				// in progress
+// - ia64 			// not started
+- s390x				// in progress
+
+3. Configuring Uprobes
+
+// TODO: The patch actually puts Uprobes configuration under "Instrumentation
+// Support" with Kprobes.  Need to decide which is the better place.
+
+When configuring the kernel using make menuconfig/xconfig/oldconfig,
+ensure that CONFIG_UPROBES is set to "y".  Under "Process debugging
+support," select "Infrastructure for tracing and debugging user
+processes" to enable Utrace, then select "Uprobes".
+
+So that you can load and unload Uprobes-based instrumentation modules,
+make sure "Loadable module support" (CONFIG_MODULES) and "Module
+unloading" (CONFIG_MODULE_UNLOAD) are set to "y".
+
+4. API Reference
+
+The Uprobes API includes two functions, register_uprobe() and
+unregister_uprobe().  Here are terse, mini-man-page specifications for
+these functions and the associated probe handlers that you'll write.
+See the latter half of this document for an example.
+
+4.1 register_uprobe
+
+#include <linux/uprobes.h>
+int register_uprobe(struct uprobe *u);
+
+Sets a breakpoint at virtual address u->vaddr in the process whose
+pid is u->pid.  When the breakpoint is hit, Uprobes calls u->handler.
+
+register_uprobe() returns 0 on success, or a negative errno otherwise.
+
+User's handler (u->handler):
+#include <linux/uprobes.h>
+#include <linux/ptrace.h>
+void handler(struct uprobe *u, struct pt_regs *regs);
+
+Called with u pointing to the uprobe associated with the breakpoint,
+and regs pointing to the struct containing the registers saved when
+the breakpoint was hit.
+
+4.2 unregister_uprobe
+
+#include <linux/uprobes.h>
+void unregister_uprobe(struct uprobe *u);
+
+Removes the specified probe.  unregister_uprobe() can be called
+at any time after the probe has been registered.
+
+5. Uprobes Features and Limitations
+
+The user is expected to assign values only to the following members
+of struct uprobe: pid, vaddr, and handler.  Other members are reserved
+for Uprobes' use.  Uprobes may produce unexpected results if you:
+- assign non-zero values to reserved members of struct uprobe;
+- change the contents of a uprobe object while it is registered; or
+- attempt to register a uprobe that is already registered.
+
+Uprobes allows any number of probes at a particular address.  For a
+particular probepoint, handlers are run in the order in which they
+were registered.
+
+Any number of kernel modules may probe a particular process
+simultaneously, and a particular module may probe any number of
+processes simultaneously.
+
+Probes are shared by all threads in a process (including newly created
+threads).
+
+If a probed process exits or execs, Uprobes automatically unregisters
+all uprobes associated with that process.  Subsequent attempts to
+unregister these probes will be treated as no-ops.
+
+On the other hand, if a probed memory area is removed from the
+process's virtual memory map (e.g., via dlclose(3) or munmap(2)),
+it's currently up to you to unregister the probes first.
+
+There is no way to specify that probes should be inherited across fork;
+Uprobes removes all probepoints in the newly created child process.
+See Section 7, "Interoperation with Utrace", for more information on
+this topic.
+
+On at least some architectures, Uprobes makes no attempt to verify
+that the probe address you specify actually marks the start of an
+instruction.  If you get this wrong, chaos may ensue.
+
+To avoid interfering with interactive debuggers, Uprobes will refuse
+to insert a probepoint where a breakpoint instruction already exists,
+unless it was Uprobes that put it there.  Some architectures may
+refuse to insert probes on other types of instructions.
+
+If you install a probe in an inline-able function, Uprobes makes
+no attempt to chase down all inline instances of the function and
+install probes there.  gcc may inline a function without being asked,
+so keep this in mind if you're not seeing the probe hits you expect.
+
+A probe handler can modify the environment of the probed function
+-- e.g., by modifying data structures, or by modifying the
+contents of the pt_regs struct (which are restored to the registers
+upon return from the breakpoint).  So Uprobes can be used, for example,
+to install a bug fix or to inject faults for testing.  Uprobes, of
+course, has no way to distinguish the deliberately injected faults
+from the accidental ones.  Don't drink and probe.
+
+When you register the first probe at probepoint or unregister the
+last probe probe at a probepoint, Uprobes asks Utrace to "quiesce"
+the probed process so that Uprobes can insert or remove the breakpoint
+instruction.  If the process is not already stopped, Utrace stops it.
+If the process is running an interruptible system call, this may cause
+the system call to finish early or fail with EINTR.  (The PTRACE_ATTACH
+request of the ptrace system call has this same limitation.)
+
+When Uprobes establishes a probepoint on a previous unprobed page
+of text, Linux creates a new copy of the page via its copy-on-write
+mechanism.  When probepoints are removed, Uprobes makes no attempt
+to consolidate identical copies of the same page.  This could affect
+memory availability if you probe many, many pages in many, many
+long-running processes.
+
+6. Interoperation with Kprobes
+
+Uprobes is intended to interoperate usefully with Kprobes (see
+Documentation/kprobes.txt).  For example, an instrumentation module
+can make calls to both the Kprobes API and the Uprobes API.
+
+A uprobe handler can register or unregister kprobes, jprobes,
+and kretprobes.  On the other hand, a kprobe, jprobe, or kretprobe
+handler must not sleep, and therefore cannot register or unregister
+any of these types of probes.  (Ideas for removing this restriction
+are welcome.)
+
+Note that the overhead of a uprobe hit is several times that of a
+kprobe hit.
+
+7. Interoperation with Utrace
+
+As mentioned in Section 1.2, Uprobes is a client of Utrace.  For each
+probed thread, Uprobes establishes a Utrace engine, and registers
+callbacks for the following types of events: clone/fork, exec, exit,
+and "core-dump" signals (which include breakpoint traps).  Uprobes
+establishes this engine when the process is first probed, or when
+Uprobes is notified of the thread's creation, whichever comes first.
+
+An instrumentation module can use both the Utrace and Uprobes APIs (as
+well as Kprobes).  When you do this, keep the following facts in mind:
+
+- For a particular event, Utrace callbacks are called in the order in
+which the engines are established.  Utrace does not currently provide
+a mechanism for altering this order.
+
+- When Uprobes learns that a probed process has forked, it removes
+the breakpoints in the child process.
+
+- When Uprobes learns that a probed process has exec-ed or exited,
+it disposes of its data structures for that process (first allowing
+any outstanding [un]registration operations to terminate).
+
+- When a probed thread hits a breakpoint or completes single-stepping
+of a probed instruction, engines with the UTRACE_EVENT(SIGNAL_CORE)
+flag set are notified.  The Uprobes signal callback prevents (via
+UTRACE_ACTION_HIDE) this event from being reported to engines later
+in the list.  But if your engine was established before Uprobes's,
+you will see this this event.
+
+If you want to establish probes in a newly forked child, you can use
+the following procedure:
+
+- Register a report_clone callback with Utrace.  In this callback,
+the CLONE_THREAD flag distinguishes between the creation of a new
+thread vs. a new process.
+
+- In your report_clone callback, call utrace_attach() to attach to
+the child process, and set the engine's UTRACE_ACTION_QUIESCE flag.
+The child process will quiesce at a point where it is ready to
+be probed.
+
+- In your report_quiesce callback, register the desired probes.
+(Note that you cannot use the same probe object for both parent
+and child.  If you want to duplicate the probepoints, you must
+create a new set of uprobe objects.)
+
+8. Probe Overhead
+
+// TODO: Adjust as other architectures are tested.
+On a typical CPU in use in 2007, a uprobe hit takes 3 to 4
+microseconds to process.  Specifically, a benchmark that hits the same
+probepoint repeatedly, firing a simple handler each time, reports
+250,000 to 300,000 hits per second, depending on the architecture.
+
+Here are sample overhead figures (in usec) for different architectures.
+
+i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips
+4.2 usec/hit (single-stepping inline)
+
+x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips
+// TODO
+
+ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU)
+// TODO
+
+9. TODO
+
+a. SystemTap (http://sourceware.org/systemtap): Provides a simplified
+programming interface for probe-based instrumentation.  SystemTap
+already supports kernel probes.  It could exploit Uprobes as well.
+b. Support for other architectures.
+
+10. Uprobes Team
+
+The following people have made major contributions to Uprobes:
+Jim Keniston - jkenisto@us.ibm.com
+Ananth Mavinakayanahalli - ananth@in.ibm.com
+Prasanna Panchamukhi - prasanna@in.ibm.com
+Dave Wilder - dwilder@us.ibm.com
+
+11. Uprobes Example
+
+Here's a sample kernel module showing the use of Uprobes to count the
+number of times an instruction at a particular address is executed,
+and optionally (unless verbose=0) report each time it's executed.
+----- cut here -----
+/* uprobe_example.c */
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/uprobes.h>
+
+/*
+ * Usage: insmod uprobe_example.ko pid=<pid> vaddr=<address> [verbose=0]
+ * where <pid> identifies the probed process and <address> is the virtual
+ * address of the probed instruction.
+ */
+
+static int pid = 0;
+module_param(pid, int, 0);
+MODULE_PARM_DESC(pid, "pid");
+
+static int verbose = 1;
+module_param(verbose, int, 0);
+MODULE_PARM_DESC(verbose, "verbose");
+
+static long vaddr = 0;
+module_param(vaddr, long, 0);
+MODULE_PARM_DESC(vaddr, "vaddr");
+
+static int nhits;
+static struct uprobe usp;
+
+static void uprobe_handler(struct uprobe *u, struct pt_regs *regs)
+{
+	nhits++;
+	if (verbose)
+		printk(KERN_INFO "Hit #%d on probepoint at %#lx\n",
+			nhits, u->vaddr);
+}
+
+int __init init_module(void)
+{
+	int ret;
+	usp.pid = pid;
+	usp.vaddr = vaddr;
+	usp.handler = uprobe_handler;
+	printk(KERN_INFO "Registering uprobe on pid %d, vaddr %#lx\n",
+		usp.pid, usp.vaddr);
+	ret = register_uprobe(&usp);
+	if (ret != 0) {
+		printk(KERN_ERR "register_uprobe() failed, returned %d\n", ret);
+		return -1;
+	}
+	return 0;
+}
+
+void __exit cleanup_module(void)
+{
+	printk(KERN_INFO "Unregistering uprobe on pid %d, vaddr %#lx\n",
+		usp.pid, usp.vaddr);
+	printk(KERN_INFO "Probepoint was hit %d times\n", nhits);
+	unregister_uprobe(&usp);
+}
+MODULE_LICENSE("GPL");
+----- cut here -----
+
+You can build the kernel module, uprobe_example.ko, using the following
+Makefile:
+----- cut here -----
+obj-m := uprobe_example.o
+KDIR := /lib/modules/$(shell uname -r)/build
+PWD := $(shell pwd)
+default:
+	$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
+clean:
+	rm -f *.mod.c *.ko *.o .*.cmd
+	rm -rf .tmp_versions
+----- cut here -----
+
+For example, if you want to run myprog and monitor its calls to myfunc(),
+you can do the following:
+
+$ make			// Build the uprobe_example module.
+...
+$ nm -p myprog | awk '$3=="myfunc"'
+080484a8 T myfunc
+$ ./myprog &
+$ ps
+  PID TTY          TIME CMD
+ 4367 pts/3    00:00:00 bash
+ 8156 pts/3    00:00:00 myprog
+ 8157 pts/3    00:00:00 ps
+$ su -
+...
+# insmod uprobe_example.ko pid=8156 vaddr=0x080484a8
+
+In /var/log/messages and on the console, you will see a message of the
+form "kernel: Hit #1 on probepoint at 0x80484a8" each time myfunc()
+is called.  To turn off probing, remove the module:
+
+# rmmod uprobe_example
+
+In /var/log/messages and on the console, you will see a message of the
+form "Probepoint was hit 5 times".
diff -puN arch/i386/Kconfig~1-uprobes-base arch/i386/Kconfig
--- linux-2.6.21-rc6/arch/i386/Kconfig~1-uprobes-base	2007-05-17 14:05:37.000000000 -0700
+++ linux-2.6.21-rc6-jimk/arch/i386/Kconfig	2007-05-17 14:06:17.000000000 -0700
@@ -1231,6 +1231,16 @@ config KPROBES
 	  for kernel debugging, non-intrusive instrumentation and testing.
 	  If in doubt, say "N".
 
+config UPROBES
+	bool "User-space probes (EXPERIMENTAL)"
+	depends on UTRACE && EXPERIMENTAL && MODULES
+	help
+	  Uprobes allows kernel modules to establish probepoints
+	  in user applications and execute handler functions when
+	  the probepoints are hit.  For more information, refer to
+	  Documentation/uprobes.txt.
+	  If in doubt, say "N".
+
 source "kernel/Kconfig.marker"
 
 endmenu
diff -puN /dev/null include/asm-i386/uprobes.h
--- /dev/null	2007-05-22 10:49:31.544575860 -0700
+++ linux-2.6.21-rc6-jimk/include/asm-i386/uprobes.h	2007-05-21 14:04:46.000000000 -0700
@@ -0,0 +1,54 @@
+#ifndef _ASM_UPROBES_H
+#define _ASM_UPROBES_H
+/*
+ *  Userspace Probes (UProbes)
+ *  include/asm-i386/uprobes.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2006
+ */
+#include <linux/types.h>
+#include <linux/ptrace.h>
+
+typedef u8 uprobe_opcode_t;
+#define BREAKPOINT_INSTRUCTION	0xcc
+#define BP_INSN_SIZE 1
+#define MAX_UINSN_BYTES 16
+#define SLOT_IP 12	/* instruction pointer slot from include/asm/elf.h */
+
+/* Architecture specific switch for where the IP points after a bp hit */
+#define ARCH_BP_INST_PTR(inst_ptr)	(inst_ptr - BP_INSN_SIZE)
+
+struct uprobe_probept;
+
+/* Caller prohibits probes on int3.  We currently allow everything else. */
+static inline int arch_validate_probed_insn(struct uprobe_probept *ppt)
+{
+	return 0;
+}
+
+/* On i386, the int3 traps leaves eip pointing past the int3 instruction. */
+static inline unsigned long arch_get_probept(struct pt_regs *regs)
+{
+	return (unsigned long) (regs->eip - BP_INSN_SIZE);
+}
+
+static inline void arch_reset_ip_for_sstep(struct pt_regs *regs)
+{
+	regs->eip -= BP_INSN_SIZE;
+}
+
+#endif				/* _ASM_UPROBES_H */
diff -puN /dev/null include/linux/uprobes.h
--- /dev/null	2007-05-22 10:49:31.544575860 -0700
+++ linux-2.6.21-rc6-jimk/include/linux/uprobes.h	2007-05-22 08:54:16.000000000 -0700
@@ -0,0 +1,253 @@
+#ifndef _LINUX_UPROBES_H
+#define _LINUX_UPROBES_H
+/*
+ * Userspace Probes (UProbes)
+ * include/linux/uprobes.h
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2006
+ */
+#include <linux/types.h>
+
+struct pt_regs;
+
+/* This is what the user supplies us. */
+struct uprobe {
+	/*
+	 * The pid of the probed process.  Currently, this can be the
+	 * thread ID (task->pid) of any active thread in the process.
+	 */
+	pid_t pid;
+
+	/* Location of the probepoint */
+	unsigned long vaddr;
+
+	/* Handler to run when the probepoint is hit */
+	void (*handler)(struct uprobe*, struct pt_regs*);
+
+	/* Reserved for use by uprobes */
+	void *reserved;
+	void *kdata;
+};
+
+#ifdef CONFIG_UPROBES
+extern int register_uprobe(struct uprobe *u);
+extern void unregister_uprobe(struct uprobe *u);
+#else
+static inline int register_uprobe(struct uprobe *u)
+{
+	return -ENOSYS;
+}
+static inline void unregister_uprobe(struct uprobe *u)
+{
+}
+#endif	/* CONFIG_UPROBES */
+
+#ifdef UPROBES_IMPLEMENTATION
+
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/rwsem.h>
+#include <linux/wait.h>
+#include <linux/kref.h>
+#include <asm/uprobes.h>
+
+struct task_struct;
+struct utrace_attached_engine;
+
+enum uprobe_probept_state {
+	UPROBE_INSERTING,	// process quiescing prior to insertion
+	UPROBE_BP_SET,		// breakpoint in place
+	UPROBE_REMOVING,	// process quiescing prior to removal
+	UPROBE_DISABLED		// removal completed
+};
+
+enum uprobe_task_state {
+	UPTASK_QUIESCENT,
+	UPTASK_SLEEPING,	// used when task may not be able to quiesce
+	UPTASK_RUNNING,
+	UPTASK_BP_HIT,
+	UPTASK_PRE_SSTEP,
+	UPTASK_SSTEP,
+	UPTASK_POST_SSTEP
+};
+
+#define UPROBE_HASH_BITS 5
+#define UPROBE_TABLE_SIZE (1 << UPROBE_HASH_BITS)
+
+/*
+ * uprobe_process -- not a user-visible struct.
+ * A uprobe_process represents a probed process.  A process can have
+ * multiple probepoints (each represented by a uprobe_probept) and
+ * one or more threads (each represented by a uprobe_task).
+ */
+struct uprobe_process {
+	/*
+	 * rwsem is write-locked for any change to the uprobe_process's
+	 * graph (including uprobe_tasks, uprobe_probepts, and uprobe_kimgs) --
+	 * e.g., due to probe [un]registration or special events like exit.
+	 * It's read-locked during the whole time we process a probepoint hit.
+	 */
+	struct rw_semaphore rwsem;
+
+	/* Table of uprobe_probepts registered for this process */
+	/* TODO: Switch to list_head[] per Ingo. */
+	struct hlist_head uprobe_table[UPROBE_TABLE_SIZE];
+	int nppt;	/* number of probepoints */
+
+	/* List of uprobe_probepts awaiting insertion or removal */
+	struct list_head pending_uprobes;
+
+	/* List of uprobe_tasks in this task group */
+	struct list_head thread_list;
+	int nthreads;
+	int n_quiescent_threads;
+
+	/* this goes on the uproc_table */
+	struct hlist_node hlist;
+
+	/*
+	 * All threads (tasks) in a process share the same uprobe_process.
+	 */
+	pid_t tgid;
+
+	/* Threads in SLEEPING state wait here to be roused. */
+	wait_queue_head_t waitq;
+
+	/*
+	 * We won't free the uprobe_process while...
+	 * - any register/unregister operations on it are in progress; or
+	 * - uprobe_table[] is not empty; or
+	 * - any tasks are SLEEPING in the waitq.
+	 * refcount reflects this.  We do NOT ref-count tasks (threads),
+	 * since once the last thread has exited, the rest is academic.
+	 */
+	struct kref refcount;
+};
+
+/*
+ * uprobe_kimg -- not a user-visible struct.
+ * Holds implementation-only per-uprobe data.
+ * uprobe->kdata points to this.
+ */
+struct uprobe_kimg {
+	struct uprobe *uprobe;
+	struct uprobe_probept *ppt;
+
+	/*
+	 * -EBUSY while we're waiting for all threads to quiesce so the
+	 * associated breakpoint can be inserted or removed.
+	 * 0 if the the insert/remove operation has succeeded, or -errno
+	 * otherwise.
+	 */
+	int status;
+
+	/* on ppt's list */
+	struct list_head list;
+};
+
+/*
+ * uprobe_probept -- not a user-visible struct.
+ * A probepoint, at which several uprobes can be registered.
+ * Guarded by uproc->rwsem.
+ */
+struct uprobe_probept {
+	/* vaddr copied from (first) uprobe */
+	unsigned long vaddr;
+
+	/* The uprobe_kimg(s) associated with this uprobe_probept */
+	struct list_head uprobe_list;
+
+	enum uprobe_probept_state state;
+
+	/* Saved opcode (which has been replaced with breakpoint) */
+	uprobe_opcode_t opcode;
+
+	/* Saved original instruction */
+	uprobe_opcode_t insn[MAX_UINSN_BYTES / sizeof(uprobe_opcode_t)];
+
+	/* The parent uprobe_process */
+	struct uprobe_process *uproc;
+
+	/*
+	 * ppt goes in the uprobe_process->uprobe_table when registered --
+	 * even before the breakpoint has been inserted.
+	 */
+	struct hlist_node ut_node;
+
+	/*
+	 * ppt sits in the uprobe_process->pending_uprobes queue while
+	 * awaiting insertion or removal of the breakpoint.
+	 */
+	struct list_head pd_node;
+
+	/* [un]register_uprobe() waits 'til bkpt inserted/removed. */
+	wait_queue_head_t waitq;
+
+	/*
+	 * Serialize single-stepping inline, so threads don't clobber
+	 * each other swapping the breakpoint instruction in and out.
+	 * This helps prevent crashing the probed app, but it does NOT
+	 * prevent probe misses while the breakpoint is swapped out.
+	 */
+	struct mutex ssil_mutex;
+};
+
+/*
+ * uprobe_utask -- not a user-visible struct.
+ * Corresponds to a thread in a probed process.
+ * Guarded by uproc->rwsem.
+ */
+struct uprobe_task {
+	/* Lives on the thread_list for the uprobe_process */
+	struct list_head list;
+
+	/* This is a back pointer to the task_struct for this task */
+	struct task_struct *tsk;
+
+	/* The utrace engine for this task */
+	struct utrace_attached_engine *engine;
+
+	/* Back pointer to the associated uprobe_process */
+	struct uprobe_process *uproc;
+
+	enum uprobe_task_state state;
+
+	/*
+	 * quiescing = 1 means this task has been asked to quiesce.
+	 * It may not be able to comply immediately if it's hit a bkpt.
+	 */
+	int quiescing;
+
+	/* Task currently running quiesce_all_threads() */
+	struct task_struct *quiesce_master;
+
+	/* Set before running handlers; cleared after single-stepping. */
+	struct uprobe_probept *active_probe;
+
+	/* Saved address of copied original instruction */
+	long singlestep_addr;
+
+	/*
+	 * Unexpected error in probepoint handling has left task's
+	 * text or stack corrupted.  Kill task ASAP.
+	 */
+	int doomed;
+};
+
+#endif	/* UPROBES_IMPLEMENTATION */
+
+#endif	/* _LINUX_UPROBES_H */
diff -puN kernel/Makefile~1-uprobes-base kernel/Makefile
--- linux-2.6.21-rc6/kernel/Makefile~1-uprobes-base	2007-05-17 14:05:37.000000000 -0700
+++ linux-2.6.21-rc6-jimk/kernel/Makefile	2007-05-17 14:06:17.000000000 -0700
@@ -55,6 +55,7 @@ obj-$(CONFIG_TASK_DELAY_ACCT) += delayac
 obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o
 obj-$(CONFIG_UTRACE) += utrace.o
 obj-$(CONFIG_PTRACE) += ptrace.o
+obj-$(CONFIG_UPROBES) += uprobes.o
 
 ifneq ($(CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER),y)
 # According to Alan Modra <alan@linuxcare.com.au>, the -fno-omit-frame-pointer is
diff -puN /dev/null kernel/uprobes.c
--- /dev/null	2007-05-22 10:49:31.544575860 -0700
+++ linux-2.6.21-rc6-jimk/kernel/uprobes.c	2007-05-22 08:25:47.000000000 -0700
@@ -0,0 +1,1390 @@
+/*
+ *  Userspace Probes (UProbes)
+ *  kernel/uprobes.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2006
+ */
+#include <linux/types.h>
+#include <linux/hash.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/rcupdate.h>
+#include <linux/err.h>
+#include <linux/kref.h>
+#include <linux/utrace.h>
+#define UPROBES_IMPLEMENTATION 1
+#include <linux/uprobes.h>
+#include <linux/tracehook.h>
+#include <linux/mm.h>
+#include <asm/tracehook.h>
+#include <asm/errno.h>
+
+#define SET_ENGINE_FLAGS	1
+#define CLEAR_ENGINE_FLAGS	0
+
+extern int access_process_vm(struct task_struct *tsk, unsigned long addr,
+	void *buf, int len, int write);
+static int utask_fake_quiesce(struct uprobe_task *utask);
+
+/* Table of currently probed processes, hashed by tgid. */
+static struct hlist_head uproc_table[UPROBE_TABLE_SIZE];
+
+/* Protects uproc_table during uprobe (un)registration */
+static DEFINE_MUTEX(uproc_mutex);
+
+/* p_uprobe_utrace_ops = &uprobe_utrace_ops.  Fwd refs are a pain w/o this. */
+static const struct utrace_engine_ops *p_uprobe_utrace_ops;
+
+static inline void uprobe_get_process(struct uprobe_process *uproc)
+{
+	kref_get(&uproc->refcount);
+}
+
+static void uprobe_release_process(struct kref *kref);
+
+static inline int uprobe_put_process(struct uprobe_process *uproc)
+{
+	return kref_put(&uproc->refcount, uprobe_release_process);
+}
+
+/* Runs with the uproc_mutex held.  Returns with uproc ref-counted. */
+struct uprobe_process *uprobe_find_process(pid_t tgid)
+{
+	struct hlist_head *head;
+	struct hlist_node *node;
+	struct uprobe_process *uproc;
+
+	head = &uproc_table[hash_long(tgid, UPROBE_HASH_BITS)];
+	hlist_for_each_entry(uproc, node, head, hlist) {
+		if (uproc->tgid == tgid) {
+			uprobe_get_process(uproc);
+			return uproc;
+		}
+	}
+	return NULL;
+}
+
+/*
+ * In the given uproc's hash table of probepoints, find the one with the
+ * specified virtual address.  Runs with uproc->rwsem locked.
+ */
+struct uprobe_probept *uprobe_find_probept(struct uprobe_process *uproc,
+		unsigned long vaddr)
+{
+	struct uprobe_probept *ppt;
+	struct hlist_node *node;
+	struct hlist_head *head = &uproc->uprobe_table[hash_long(vaddr,
+		UPROBE_HASH_BITS)];
+
+	hlist_for_each_entry(ppt, node, head, ut_node) {
+		if (ppt->vaddr == vaddr && ppt->state != UPROBE_DISABLED)
+			return ppt;
+	}
+	return NULL;
+}
+
+/*
+ * set_bp: Store a breakpoint instruction at ppt->vaddr.
+ * Returns BP_INSN_SIZE on success.
+ *
+ * NOTE: BREAKPOINT_INSTRUCTION on all archs is the same size as
+ * uprobe_opcode_t.
+ */
+static int set_bp(struct uprobe_probept *ppt, struct task_struct *tsk)
+{
+	uprobe_opcode_t bp_insn = BREAKPOINT_INSTRUCTION;
+	return access_process_vm(tsk, ppt->vaddr, &bp_insn, BP_INSN_SIZE, 1);
+}
+
+/*
+ * set_orig_insn:  For probepoint ppt, replace the breakpoint instruction
+ * with the original opcode.  Returns BP_INSN_SIZE on success.
+ */
+static int set_orig_insn(struct uprobe_probept *ppt, struct task_struct *tsk)
+{
+	return access_process_vm(tsk, ppt->vaddr, &ppt->opcode, BP_INSN_SIZE,
+		1);
+}
+
+static void bkpt_insertion_failed(struct uprobe_probept *ppt, const char *why)
+{
+	printk(KERN_ERR "Can't place uprobe at pid %d vaddr %#lx: %s\n",
+			ppt->uproc->tgid, ppt->vaddr, why);
+}
+
+/*
+ * Save a copy of the original instruction (so it can be single-stepped
+ * out of line), insert the breakpoint instruction, and awake
+ * register_uprobe().
+ */
+static void insert_bkpt(struct uprobe_probept *ppt, struct task_struct *tsk)
+{
+	struct uprobe_kimg *uk;
+	long result = 0;
+	int len;
+
+	if (!tsk) {
+		/* No surviving tasks associated with ppt->uproc */
+		result = -ESRCH;
+		goto out;
+	}
+
+	/*
+	 * If access_process_vm() transfers fewer bytes than the maximum
+	 * instruction size, assume that the probed instruction is smaller
+	 * than the max and near the end of the last page of instructions.
+	 * But there must be room at least for a breakpoint-size instruction.
+	 */
+	len = access_process_vm(tsk, ppt->vaddr, ppt->insn, MAX_UINSN_BYTES, 0);
+	if (len < BP_INSN_SIZE) {
+		bkpt_insertion_failed(ppt,
+			"error reading original instruction");
+		result = -EIO;
+		goto out;
+	}
+	memcpy(&ppt->opcode, ppt->insn, BP_INSN_SIZE);
+	if (ppt->opcode == BREAKPOINT_INSTRUCTION) {
+		bkpt_insertion_failed(ppt, "bkpt already exists at that addr");
+		result = -EEXIST;
+		goto out;
+	}
+
+	if ((result = arch_validate_probed_insn(ppt)) < 0) {
+		bkpt_insertion_failed(ppt, "instruction type cannot be probed");
+		goto out;
+	}
+
+	len = set_bp(ppt, tsk);
+	if (len < BP_INSN_SIZE) {
+		bkpt_insertion_failed(ppt, "failed to insert bkpt instruction");
+		result = -EIO;
+		goto out;
+	}
+out:
+	ppt->state = (result ? UPROBE_DISABLED : UPROBE_BP_SET);
+	list_for_each_entry(uk, &ppt->uprobe_list, list)
+		uk->status = result;
+	wake_up_all(&ppt->waitq);
+}
+
+static void remove_bkpt(struct uprobe_probept *ppt, struct task_struct *tsk)
+{
+	int len;
+
+	if (tsk) {
+		len = set_orig_insn(ppt, tsk);
+		if (len < BP_INSN_SIZE) {
+			printk(KERN_ERR
+				"Error removing uprobe at pid %d vaddr %#lx:"
+				" can't restore original instruction\n",
+				tsk->tgid, ppt->vaddr);
+			/*
+			 * This shouldn't happen, since we were previously
+			 * able to write the breakpoint at that address.
+			 * There's not much we can do besides let the
+			 * process die with a SIGTRAP the next time the
+			 * breakpoint is hit.
+			 */
+		}
+	}
+	/* Wake up unregister_uprobe(). */
+	ppt->state = UPROBE_DISABLED;
+	wake_up_all(&ppt->waitq);
+}
+
+/*
+ * Runs with all of uproc's threads quiesced and uproc->rwsem write-locked.
+ * As specified, insert or remove the breakpoint instruction for each
+ * uprobe_probept on uproc's pending list.
+ * tsk = one of the tasks associated with uproc -- NULL if there are
+ * no surviving threads.
+ * It's OK for uproc->pending_uprobes to be empty here.  It can happen
+ * if a register and an unregister are requested (by different probers)
+ * simultaneously for the same pid/vaddr.
+ * Note that the current task may be a thread in uproc, or it may be
+ * a task running [un]register_uprobe() (or both).
+ */
+static void handle_pending_uprobes(struct uprobe_process *uproc,
+	struct task_struct *tsk)
+{
+	struct uprobe_probept *ppt, *tmp;
+
+	list_for_each_entry_safe(ppt, tmp, &uproc->pending_uprobes, pd_node) {
+		switch (ppt->state) {
+		case UPROBE_INSERTING:
+			insert_bkpt(ppt, tsk);
+			break;
+		case UPROBE_REMOVING:
+			remove_bkpt(ppt, tsk);
+			break;
+		default:
+			BUG();
+		}
+		list_del(&ppt->pd_node);
+	}
+}
+
+static void utask_adjust_flags(struct uprobe_task *utask, int set,
+	unsigned long flags)
+{
+	unsigned long newflags, oldflags;
+
+	newflags = oldflags = utask->engine->flags;
+
+	if (set)
+		newflags |= flags;
+	else
+		newflags &= ~flags;
+
+	if (newflags != oldflags)
+		utrace_set_flags(utask->tsk, utask->engine, newflags);
+}
+
+static inline void clear_utrace_quiesce(struct uprobe_task *utask)
+{
+	utask_adjust_flags(utask, CLEAR_ENGINE_FLAGS,
+			UTRACE_ACTION_QUIESCE | UTRACE_EVENT(QUIESCE));
+}
+
+/* Opposite of quiesce_all_threads().  Same locking applies. */
+static void rouse_all_threads(struct uprobe_process *uproc)
+{
+	struct uprobe_task *utask;
+
+	list_for_each_entry(utask, &uproc->thread_list, list) {
+		if (utask->quiescing) {
+			utask->quiescing = 0;
+			if (utask->state == UPTASK_QUIESCENT) {
+				clear_utrace_quiesce(utask);
+				utask->state = UPTASK_RUNNING;
+				uproc->n_quiescent_threads--;
+			}
+		}
+	}
+	/* Wake any threads that decided to sleep rather than quiesce. */
+	wake_up_all(&uproc->waitq);
+}
+
+/*
+ * If all of uproc's surviving threads have quiesced, do the necessary
+ * breakpoint insertions or removals and then un-quiesce everybody.
+ * tsk is a surviving thread, or NULL if there is none.  Runs with
+ * uproc->rwsem write-locked.
+ */
+static void check_uproc_quiesced(struct uprobe_process *uproc,
+		struct task_struct *tsk)
+{
+	if (uproc->n_quiescent_threads >= uproc->nthreads) {
+		handle_pending_uprobes(uproc, tsk);
+		rouse_all_threads(uproc);
+	}
+}
+
+/*
+ * Quiesce all threads in the specified process -- e.g., prior to
+ * breakpoint insertion.  Runs with uproc->rwsem write-locked.
+ * Returns the number of threads that haven't died yet.
+ */
+static int quiesce_all_threads(struct uprobe_process *uproc,
+		struct uprobe_task **cur_utask_quiescing)
+{
+	struct uprobe_task *utask;
+	struct task_struct *survivor = NULL;	// any survivor
+	int survivors = 0;
+
+	*cur_utask_quiescing = NULL;
+	list_for_each_entry(utask, &uproc->thread_list, list) {
+		survivor = utask->tsk;
+		survivors++;
+		if (!utask->quiescing) {
+			/*
+			 * If utask is currently handling a probepoint, it'll
+			 * check utask->quiescing and quiesce when it's done.
+			 */
+			utask->quiescing = 1;
+			if (utask->tsk == current)
+				*cur_utask_quiescing = utask;
+			else if (utask->state == UPTASK_RUNNING) {
+				utask->quiesce_master = current;
+				utask_adjust_flags(utask, SET_ENGINE_FLAGS,
+					UTRACE_ACTION_QUIESCE
+					| UTRACE_EVENT(QUIESCE));
+				utask->quiesce_master = NULL;
+			}
+		}
+	}
+	/*
+	 * If any task was already quiesced (in utrace's opinion) when we
+	 * called utask_adjust_flags() on it, uprobe_report_quiesce() was
+	 * called, but wasn't in a position to call check_uproc_quiesced().
+	 */
+	check_uproc_quiesced(uproc, survivor);
+	return survivors;
+}
+
+/* Runs with uproc_mutex help and uproc->rwsem write-locked. */
+static void uprobe_free_process(struct uprobe_process *uproc)
+{
+	struct uprobe_task *utask, *tmp;
+
+	if (!hlist_unhashed(&uproc->hlist))
+		hlist_del(&uproc->hlist);
+
+	list_for_each_entry_safe(utask, tmp, &uproc->thread_list, list) {
+		/*
+		 * utrace_detach() is OK here (required, it seems) even if
+		 * utask->tsk == current and we're in a utrace callback.
+		 */
+		if (utask->engine)
+			utrace_detach(utask->tsk, utask->engine);
+		kfree(utask);
+	}
+	up_write(&uproc->rwsem);	// So kfree doesn't complain
+	kfree(uproc);
+}
+
+/* Uproc's ref-count has dropped to zero.  Free everything. */
+static void uprobe_release_process(struct kref *ref)
+{
+	struct uprobe_process *uproc = container_of(ref, struct uprobe_process,
+		refcount);
+	mutex_lock(&uproc_mutex);
+	down_write(&uproc->rwsem);
+	uprobe_free_process(uproc);
+	mutex_unlock(&uproc_mutex);
+}
+
+static struct uprobe_kimg *uprobe_mk_kimg(struct uprobe *u)
+{
+	struct uprobe_kimg *uk = (struct uprobe_kimg*)kzalloc(sizeof *uk,
+		GFP_USER);
+	if (unlikely(!uk))
+		return ERR_PTR(-ENOMEM);
+	u->kdata = uk;
+	uk->uprobe = u;
+	uk->ppt = NULL;
+	INIT_LIST_HEAD(&uk->list);
+	uk->status = -EBUSY;
+	return uk;
+}
+
+/*
+ * Allocate a uprobe_task object for t and add it to uproc's list.
+ * Called with t "got" and uproc->rwsem write-locked.  Called in one of
+ * the following cases:
+ * - before setting the first uprobe in t's process
+ * - we're in uprobe_report_clone() and t is the newly added thread
+ * Returns:
+ * - pointer to new uprobe_task on success
+ * - NULL if t dies before we can utrace_attach it
+ * - negative errno otherwise
+ */
+static struct uprobe_task *uprobe_add_task(struct task_struct *t,
+		struct uprobe_process *uproc)
+{
+	struct uprobe_task *utask;
+	struct utrace_attached_engine *engine;
+
+	utask = (struct uprobe_task *)kzalloc(sizeof *utask, GFP_USER);
+	if (unlikely(utask == NULL))
+		return ERR_PTR(-ENOMEM);
+
+	utask->tsk = t;
+	utask->state = UPTASK_RUNNING;
+	utask->quiescing = 0;
+	utask->uproc = uproc;
+	utask->active_probe = NULL;
+	utask->doomed = 0;
+	INIT_LIST_HEAD(&utask->list);
+	list_add_tail(&utask->list, &uproc->thread_list);
+
+	engine = utrace_attach(t, UTRACE_ATTACH_CREATE, p_uprobe_utrace_ops,
+		utask);
+	if (IS_ERR(engine)) {
+		long err = PTR_ERR(engine);
+		printk("uprobes: utrace_attach failed, returned %ld\n", err);
+		list_del(&utask->list);
+		kfree(utask);
+		if (err == -ESRCH)
+			 return NULL;
+		return ERR_PTR(err);
+	}
+	utask->engine = engine;
+	/*
+	 * Always watch for traps, clones, execs and exits. Caller must
+	 * set any other engine flags.
+	 */
+	utask_adjust_flags(utask, SET_ENGINE_FLAGS,
+			UTRACE_EVENT(SIGNAL_CORE) | UTRACE_EVENT(EXEC) |
+			UTRACE_EVENT(CLONE) | UTRACE_EVENT(EXIT));
+	/*
+	 * Note that it's OK if t dies just after utrace_attach, because
+	 * with the engine in place, the appropriate report_* callback
+	 * should handle it after we release uprobe->mutex.
+	 */
+	return utask;
+}
+
+/* See comment in uprobe_mk_process(). */
+static struct task_struct *find_next_thread_to_add(struct uprobe_process *uproc,		struct task_struct *start)
+{
+	struct task_struct *t;
+	struct uprobe_task *utask;
+
+	read_lock(&tasklist_lock);
+	t = start;
+	do {
+		if (unlikely(t->flags & PF_EXITING))
+			goto dont_add;
+		list_for_each_entry(utask, &uproc->thread_list, list) {
+			if (utask->tsk == t)
+				/* Already added */
+				goto dont_add;
+		}
+		/* Found thread/task to add. */
+		get_task_struct(t);
+		read_unlock(&tasklist_lock);
+		return t;
+dont_add:
+		t = next_thread(t);
+	} while (t != start);
+
+	read_unlock(&tasklist_lock);
+	return NULL;
+}
+
+/* Runs with uproc_mutex held; returns with uproc->rwsem write-locked. */
+static struct uprobe_process *uprobe_mk_process(struct task_struct *p)
+{
+	struct uprobe_process *uproc;
+	struct uprobe_task *utask;
+	struct task_struct *add_me;
+	int i;
+	long err;
+
+	uproc = (struct uprobe_process *)kzalloc(sizeof *uproc, GFP_USER);
+	if (unlikely(uproc == NULL))
+		return ERR_PTR(-ENOMEM);
+
+	/* Initialize fields */
+	kref_init(&uproc->refcount);
+	init_rwsem(&uproc->rwsem);
+	down_write(&uproc->rwsem);
+	init_waitqueue_head(&uproc->waitq);
+	for (i = 0; i < UPROBE_TABLE_SIZE; i++)
+		INIT_HLIST_HEAD(&uproc->uprobe_table[i]);
+	uproc->nppt = 0;
+	INIT_LIST_HEAD(&uproc->pending_uprobes);
+	INIT_LIST_HEAD(&uproc->thread_list);
+	uproc->nthreads = 0;
+	uproc->n_quiescent_threads = 0;
+	INIT_HLIST_NODE(&uproc->hlist);
+	uproc->tgid = p->tgid;
+
+	/*
+	 * Create and populate one utask per thread in this process.  We
+	 * can't call uprobe_add_task() while holding tasklist_lock, so we:
+	 *	1. Lock task list.
+	 *	2. Find the next task, add_me, in this process that's not
+	 *	already on uproc's thread_list.  (Start search at previous
+	 *	one found.)
+	 *	3. Unlock task list.
+	 *	4. uprobe_add_task(add_me, uproc)
+	 *	Repeat 1-4 'til we have utasks for all tasks.
+	 */
+	add_me = p;
+	while ((add_me = find_next_thread_to_add(uproc, add_me)) != NULL) {
+		utask = uprobe_add_task(add_me, uproc);
+		put_task_struct(add_me);
+		if (IS_ERR(utask)) {
+			err = PTR_ERR(utask);
+			goto fail;
+		}
+		if (utask)
+			uproc->nthreads++;
+	}
+
+	if (uproc->nthreads == 0) {
+		/* All threads -- even p -- are dead. */
+		err = -ESRCH;
+		goto fail;
+	}
+	return uproc;
+
+fail:
+	uprobe_free_process(uproc);
+	return ERR_PTR(err);
+}
+
+/*
+ * Creates a uprobe_probept and connects it to uk and uproc.  Runs with
+ * uproc->rwsem write-locked.
+ */
+static struct uprobe_probept *uprobe_add_probept(struct uprobe_kimg *uk,
+	struct uprobe_process *uproc)
+{
+	struct uprobe_probept *ppt;
+
+	ppt = (struct uprobe_probept *)kzalloc(sizeof *ppt, GFP_USER);
+	if (unlikely(ppt == NULL))
+		return ERR_PTR(-ENOMEM);
+	init_waitqueue_head(&ppt->waitq);
+	mutex_init(&ppt->ssil_mutex);
+
+	/* Connect to uk. */
+	INIT_LIST_HEAD(&ppt->uprobe_list);
+	list_add_tail(&uk->list, &ppt->uprobe_list);
+	uk->ppt = ppt;
+	uk->status = -EBUSY;
+	ppt->vaddr = uk->uprobe->vaddr;
+
+	/* Connect to uproc. */
+	ppt->state = UPROBE_INSERTING;
+	ppt->uproc = uproc;
+	INIT_LIST_HEAD(&ppt->pd_node);
+	list_add_tail(&ppt->pd_node, &uproc->pending_uprobes);
+	INIT_HLIST_NODE(&ppt->ut_node);
+	hlist_add_head(&ppt->ut_node,
+		&uproc->uprobe_table[hash_long(ppt->vaddr, UPROBE_HASH_BITS)]);
+	uproc->nppt++;
+	uprobe_get_process(uproc);
+	return ppt;
+}
+
+/*
+ * Runs with ppt->uproc write-locked.  Frees ppt and decrements the ref count
+ * on ppt->uproc (but ref count shouldn't hit 0).
+ */
+static void uprobe_free_probept(struct uprobe_probept *ppt)
+{
+	struct uprobe_process *uproc = ppt->uproc;
+	hlist_del(&ppt->ut_node);
+	uproc->nppt--;
+	kfree(ppt);
+	uprobe_put_process(uproc);
+}
+
+static void uprobe_free_kimg(struct uprobe_kimg *uk)
+{
+	uk->uprobe->kdata = NULL;
+	kfree(uk);
+}
+
+/*
+ * Runs with uprobe_process write-locked.
+ * Note that we never free u, because the user owns that.
+ */
+static void purge_uprobe(struct uprobe_kimg *uk)
+{
+	struct uprobe_probept *ppt = uk->ppt;
+	list_del(&uk->list);
+	uprobe_free_kimg(uk);
+	if (list_empty(&ppt->uprobe_list))
+		uprobe_free_probept(ppt);
+}
+
+/* Probed address must be in an executable VM area. */
+static int uprobe_validate_vaddr(struct task_struct *p, unsigned long vaddr)
+{
+	struct vm_area_struct *vma;
+	struct mm_struct *mm = p->mm;
+	if (!mm)
+		return -EINVAL;
+	down_read(&mm->mmap_sem);
+	vma = find_vma(mm, vaddr);
+	if (!vma || vaddr < vma->vm_start || !(vma->vm_flags & VM_EXEC)) {
+		up_read(&mm->mmap_sem);
+		return -EINVAL;
+	}
+	up_read(&mm->mmap_sem);
+	return 0;
+}
+
+static struct task_struct *uprobe_get_task(pid_t pid)
+{
+	struct task_struct *p;
+	rcu_read_lock();
+	p = find_task_by_pid(pid);
+	if (p)
+		get_task_struct(p);
+	rcu_read_unlock();
+	return p;
+}
+
+/* See Documentation/uprobes.txt. */
+int register_uprobe(struct uprobe *u)
+{
+	struct task_struct *p;
+	struct uprobe_process *uproc;
+	struct uprobe_kimg *uk;
+	struct uprobe_probept *ppt;
+	struct uprobe_task *cur_utask_quiescing = NULL;
+	int survivors, ret = 0, uproc_is_new = 0;
+
+	if (!u || !u->handler)
+		return -EINVAL;
+
+	p = uprobe_get_task(u->pid);
+	if (!p)
+		return -ESRCH;
+
+	/* Get the uprobe_process for this pid, or make a new one. */
+	mutex_lock(&uproc_mutex);
+	uproc = uprobe_find_process(p->tgid);
+
+	if (uproc) {
+		down_write(&uproc->rwsem);
+		mutex_unlock(&uproc_mutex);
+	} else {
+		uproc = uprobe_mk_process(p);
+		if (IS_ERR(uproc)) {
+			ret = (int) PTR_ERR(uproc);
+			mutex_unlock(&uproc_mutex);
+			goto fail_tsk;
+		}
+		/* Hold uproc_mutex until we've added uproc to uproc_table. */
+		uproc_is_new = 1;
+	}
+
+	if ((ret = uprobe_validate_vaddr(p, u->vaddr)) < 0)
+		goto fail_uproc;
+
+	if (u->kdata) {
+		/*
+		 * Probe is already/still registered.  This is the only
+		 * place we return -EBUSY to the user.
+		 */
+		ret = -EBUSY;
+		goto fail_uproc;
+	}
+
+	uk = uprobe_mk_kimg(u);
+	if (IS_ERR(uk)) {
+		ret = (int) PTR_ERR(uk);
+		goto fail_uproc;
+	}
+
+	/* See if we already have a probepoint at the vaddr. */
+	ppt = (uproc_is_new ? NULL : uprobe_find_probept(uproc, u->vaddr));
+	if (ppt) {
+		/* Breakpoint is already in place, or soon will be. */
+		uk->ppt = ppt;
+		list_add_tail(&uk->list, &ppt->uprobe_list);
+		switch (ppt->state) {
+		case UPROBE_INSERTING:
+			uk->status = -EBUSY;	// in progress
+			BUG_ON(uproc->tgid == current->tgid);
+				// FIXME: Multiple threads self-probing?
+			break;
+		case UPROBE_REMOVING:
+			/* Wait!  Don't remove that bkpt after all! */
+			ppt->state = UPROBE_BP_SET;
+			list_del(&ppt->pd_node);  // Remove from pending list.
+			wake_up_all(&ppt->waitq); // Wake unregister_uprobe().
+			/*FALLTHROUGH*/
+		case UPROBE_BP_SET:
+			uk->status = 0;
+			break;
+		default:
+			BUG();
+		}
+		up_write(&uproc->rwsem);
+		put_task_struct(p);
+		if (uk->status == 0) {
+			uprobe_put_process(uproc);
+			return 0;
+		}
+		goto await_bkpt_insertion;
+	} else {
+		ppt = uprobe_add_probept(uk, uproc);
+		if (IS_ERR(ppt)) {
+			ret = (int) PTR_ERR(ppt);
+			goto fail_uk;
+		}
+	}
+
+	if (uproc_is_new) {
+		hlist_add_head(&uproc->hlist,
+			&uproc_table[hash_long(uproc->tgid, UPROBE_HASH_BITS)]);
+		mutex_unlock(&uproc_mutex);
+	}
+	put_task_struct(p);
+	survivors = quiesce_all_threads(uproc, &cur_utask_quiescing);
+
+	if (survivors == 0) {
+		purge_uprobe(uk);
+		up_write(&uproc->rwsem);
+		uprobe_put_process(uproc);
+		return -ESRCH;
+	}
+	up_write(&uproc->rwsem);
+
+await_bkpt_insertion:
+	if (cur_utask_quiescing)
+		/* Current task is probing its own process (via utrace?). */
+		(void) utask_fake_quiesce(cur_utask_quiescing);
+	else
+		wait_event(ppt->waitq, ppt->state != UPROBE_INSERTING);
+	ret = uk->status;
+	if (ret != 0) {
+		down_write(&uproc->rwsem);
+		purge_uprobe(uk);
+		up_write(&uproc->rwsem);
+	}
+	uprobe_put_process(uproc);
+	return ret;
+
+fail_uk:
+	uprobe_free_kimg(uk);
+
+fail_uproc:
+	if (uproc_is_new) {
+		uprobe_free_process(uproc);
+		mutex_unlock(&uproc_mutex);
+	} else
+		uprobe_put_process(uproc);
+
+fail_tsk:
+	put_task_struct(p);
+	return ret;
+}
+
+/* See Documentation/uprobes.txt. */
+void unregister_uprobe(struct uprobe *u)
+{
+	struct task_struct *p;
+	struct uprobe_process *uproc;
+	struct uprobe_kimg *uk;
+	struct uprobe_probept *ppt;
+	struct uprobe_task *cur_utask_quiescing = NULL;
+
+	if (!u)
+		return;
+
+	/*
+	 * Lock uproc before walking the graph, in case the process we're
+	 * probing is exiting.
+	 */
+	p = uprobe_get_task(u->pid);
+	if (!p)
+		return;
+	mutex_lock(&uproc_mutex);
+	uproc = uprobe_find_process(p->tgid);
+	put_task_struct(p);
+	if (!uproc) {
+		mutex_unlock(&uproc_mutex);
+		return;
+	}
+	down_write(&uproc->rwsem);
+	mutex_unlock(&uproc_mutex);
+
+	uk = (struct uprobe_kimg *)u->kdata;
+	if (!uk)
+		/*
+		 * This probe was never successfully registered, or
+		 * has already been unregistered.
+		 */
+		goto done;
+	if (uk->status == -EBUSY)
+		/* Looks like register or unregister is already in progress. */
+		goto done;
+	ppt = uk->ppt;
+
+	list_del(&uk->list);
+	uprobe_free_kimg(uk);
+	if (!list_empty(&ppt->uprobe_list))
+		goto done;
+
+	/*
+	 * The last uprobe at ppt's probepoint is being unregistered.
+	 * Queue the breakpoint for removal.
+	 */
+	ppt->state = UPROBE_REMOVING;
+	list_add_tail(&ppt->pd_node, &uproc->pending_uprobes);
+
+	(void) quiesce_all_threads(uproc, &cur_utask_quiescing);
+	up_write(&uproc->rwsem);
+	if (cur_utask_quiescing)
+		/* Current task is probing its own process (via utrace?). */
+		(void) utask_fake_quiesce(cur_utask_quiescing);
+	else
+		wait_event(ppt->waitq, ppt->state != UPROBE_REMOVING);
+
+	if (likely(ppt->state == UPROBE_DISABLED)) {
+		down_write(&uproc->rwsem);
+		uprobe_free_probept(ppt);
+		/* else somebody else's register_uprobe() resurrected ppt. */
+		up_write(&uproc->rwsem);
+	}
+	uprobe_put_process(uproc);
+	return;
+
+done:
+	up_write(&uproc->rwsem);
+	uprobe_put_process(uproc);
+}
+
+/*
+ * utrace engine report callbacks
+ */
+
+/*
+ * We've been asked to quiesce, but aren't in a position to do so.
+ * This could happen in either of the following cases:
+ *
+ * 1) Our own thread is doing a register or unregister operation --
+ * e.g., as called from a non-uprobes utrace callback.  We can't
+ * wait_event() for ourselves in [un]register_uprobe().
+ *
+ * 2) We've been asked to quiesce, but we hit a probepoint first.  Now
+ * we're in the report_signal callback, having handled the probepoint.
+ * We'd like to just set the UTRACE_ACTION_QUIESCE and
+ * UTRACE_EVENT(QUIESCE) flags and coast into quiescence.  Unfortunately,
+ * it's possible to hit a probepoint again before we quiesce.  When
+ * processing the SIGTRAP, utrace would call uprobe_report_quiesce(),
+ * which must decline to take any action so as to avoid removing the
+ * uprobe just hit.  As a result, we could keep hitting breakpoints
+ * and never quiescing.
+ *
+ * So here we do essentially what we'd prefer to do in uprobe_report_quiesce().
+ * If we're the last thread to quiesce, handle_pending_uprobes() and
+ * rouse_all_threads().  Otherwise, pretend we're quiescent and sleep until
+ * the last quiescent thread handles that stuff and then wakes us.
+ *
+ * Called and returns with no mutexes held.  Returns 1 if we free utask->uproc,
+ * else 0.
+ */
+static int utask_fake_quiesce(struct uprobe_task *utask)
+{
+	struct uprobe_process *uproc = utask->uproc;
+	enum uprobe_task_state prev_state = utask->state;
+
+	down_write(&uproc->rwsem);
+
+	/* In case we're somehow set to quiesce for real... */
+	clear_utrace_quiesce(utask);
+
+	if (uproc->n_quiescent_threads == uproc->nthreads-1) {
+		/* We're the last thread to "quiesce." */
+		handle_pending_uprobes(uproc, utask->tsk);
+		rouse_all_threads(uproc);
+		up_write(&uproc->rwsem);
+		return 0;
+	} else {
+		utask->state = UPTASK_SLEEPING;
+		uproc->n_quiescent_threads++;
+		up_write(&uproc->rwsem);
+		/* We ref-count sleepers. */
+		uprobe_get_process(uproc);
+
+		wait_event(uproc->waitq, !utask->quiescing);
+
+		down_write(&uproc->rwsem);
+		utask->state = prev_state;
+		uproc->n_quiescent_threads--;
+		up_write(&uproc->rwsem);
+
+		/*
+		 * If uproc's last uprobe has been unregistered, and
+		 * unregister_uprobe() woke up before we did, it's up
+		 * to us to free uproc.
+		 */
+		return uprobe_put_process(uproc);
+	}
+}
+
+/* Prepare to single-step ppt's probed instruction inline. */
+static inline void uprobe_pre_ssin(struct uprobe_task *utask,
+	struct uprobe_probept *ppt, struct pt_regs *regs)
+{
+	int len;
+	arch_reset_ip_for_sstep(regs);
+	mutex_lock(&ppt->ssil_mutex);
+	len = set_orig_insn(ppt, utask->tsk);
+	if (unlikely(len != BP_INSN_SIZE)) {
+		printk("Failed to temporarily restore original "
+			"instruction for single-stepping: "
+			"pid/tgid=%d/%d, vaddr=%#lx\n",
+			utask->tsk->pid, utask->tsk->tgid, ppt->vaddr);
+		utask->doomed = 1;
+	}
+}
+
+/* Prepare to continue execution after single-stepping inline. */
+static inline void uprobe_post_ssin(struct uprobe_task *utask,
+	struct uprobe_probept *ppt)
+{
+
+	int len = set_bp(ppt, utask->tsk);
+	if (unlikely(len != BP_INSN_SIZE)) {
+		printk("Couldn't restore bp: pid/tgid=%d/%d, addr=%#lx\n",
+			utask->tsk->pid, utask->tsk->tgid, ppt->vaddr);
+		ppt->state = UPROBE_DISABLED;
+	}
+	mutex_unlock(&ppt->ssil_mutex);
+}
+
+/*
+ * Signal callback:
+ *
+ * We get called here with:
+ *	state = UPTASK_RUNNING => we are here due to a breakpoint hit
+ *		- Read-lock the process
+ *		- Figure out which probepoint, based on regs->IP
+ *		- Set state = UPTASK_BP_HIT
+ *		- Reset regs->IP to beginning of the insn, if necessary
+ *		- Invoke handler for each uprobe at this probepoint
+ *		- Set singlestep in motion (UTRACE_ACTION_SINGLESTEP),
+ *			with state = UPTASK_SSTEP
+ *
+ *	state = UPTASK_SSTEP => here after single-stepping
+ *		- Validate we are here per the state machine
+ *		- Clean up after single-stepping
+ *		- Set state = UPTASK_RUNNING
+ *		- Read-unlock the process
+ *		- If it's time to quiesce, take appropriate action.
+ *
+ *	state = ANY OTHER STATE
+ *		- Not our signal, pass it on (UTRACE_ACTION_RESUME)
+ * Note: Intermediate states such as UPTASK_POST_SSTEP help
+ * uprobe_report_exit() decide what to unlock if we die.
+ */
+static u32 uprobe_report_signal(struct utrace_attached_engine *engine,
+		struct task_struct *tsk, struct pt_regs *regs, u32 action,
+		siginfo_t *info, const struct k_sigaction *orig_ka,
+		struct k_sigaction *return_ka)
+{
+	struct uprobe_task *utask;
+	struct uprobe_probept *ppt;
+	struct uprobe_process *uproc;
+	struct uprobe_kimg *uk;
+	u32 ret;
+	unsigned long probept;
+
+	utask = rcu_dereference((struct uprobe_task *)engine->data);
+	BUG_ON(!utask);
+
+	if (action != UTRACE_SIGNAL_CORE || info->si_signo != SIGTRAP)
+		goto no_interest;
+
+	uproc = utask->uproc;
+	switch (utask->state) {
+	case UPTASK_RUNNING:
+		down_read(&uproc->rwsem);
+		clear_utrace_quiesce(utask);
+		probept = arch_get_probept(regs);
+		ppt = uprobe_find_probept(uproc, probept);
+		if (!ppt) {
+			up_read(&uproc->rwsem);
+			goto no_interest;
+		}
+		utask->active_probe = ppt;
+		utask->state = UPTASK_BP_HIT;
+
+		if (likely(ppt->state == UPROBE_BP_SET)) {
+			list_for_each_entry(uk, &ppt->uprobe_list, list) {
+				struct uprobe *u = uk->uprobe;
+				if (u->handler)
+					u->handler(u, regs);
+			}
+		}
+
+		utask->state = UPTASK_PRE_SSTEP;
+		uprobe_pre_ssin(utask, ppt, regs);
+		if (unlikely(utask->doomed))
+			do_exit(SIGSEGV);
+		utask->state = UPTASK_SSTEP;
+		/*
+		 * No other engines must see this signal, and the
+		 * signal shouldn't be passed on either.
+		 */
+		ret = UTRACE_ACTION_HIDE | UTRACE_SIGNAL_IGN |
+			UTRACE_ACTION_SINGLESTEP | UTRACE_ACTION_NEWSTATE;
+		break;
+	case UPTASK_SSTEP:
+		ppt = utask->active_probe;
+		BUG_ON(!ppt);
+		utask->state = UPTASK_POST_SSTEP;
+		uprobe_post_ssin(utask, ppt);
+		if (unlikely(utask->doomed))
+			do_exit(SIGSEGV);
+
+		utask->active_probe = NULL;
+		ret = UTRACE_ACTION_HIDE | UTRACE_SIGNAL_IGN
+			| UTRACE_ACTION_NEWSTATE;
+		utask->state = UPTASK_RUNNING;
+		if (utask->quiescing) {
+			up_read(&uproc->rwsem);
+			if (utask_fake_quiesce(utask) == 1)
+				ret |= UTRACE_ACTION_DETACH;
+		} else
+			up_read(&uproc->rwsem);
+
+		break;
+	default:
+		goto no_interest;
+	}
+	return ret;
+
+no_interest:
+	return UTRACE_ACTION_RESUME;
+}
+
+/*
+ * utask_quiesce_pending_sigtrap: The utask entered the quiesce callback
+ * through the signal delivery path, apparently. Check if the associated
+ * signal happened due to a uprobe hit.
+ *
+ * Called with utask->uproc write-locked.  Returns 1 if quiesce was
+ * entered with SIGTRAP pending due to a uprobe hit.
+ */
+static int utask_quiesce_pending_sigtrap(struct uprobe_task *utask)
+{
+	const struct utrace_regset_view *view;
+	const struct utrace_regset *regset;
+	struct uprobe_probept *ppt;
+	unsigned long inst_ptr;
+
+	view = utrace_native_view(utask->tsk);
+	regset = utrace_regset(utask->tsk, utask->engine, view, 0);
+	if (unlikely(regset == NULL))
+		return -EIO;
+
+	if ((*regset->get)(utask->tsk, regset, SLOT_IP * regset->size,
+			regset->size, &inst_ptr, NULL) != 0)
+		return -EIO;
+
+	ppt = uprobe_find_probept(utask->uproc, ARCH_BP_INST_PTR(inst_ptr));
+	return (ppt != NULL);
+}
+
+/*
+ * Quiesce callback: The associated process has one or more breakpoint
+ * insertions or removals pending.  If we're the last thread in this
+ * process to quiesce, do the insertion(s) and/or removal(s).
+ */
+static u32 uprobe_report_quiesce(struct utrace_attached_engine *engine,
+		struct task_struct *tsk)
+{
+	struct uprobe_task *utask;
+	struct uprobe_process *uproc;
+
+	utask = rcu_dereference((struct uprobe_task *)engine->data);
+	BUG_ON(!utask);
+	uproc = utask->uproc;
+	if (current == utask->quiesce_master) {
+		/*
+		 * tsk was already quiescent when quiesce_all_threads()
+		 * called utrace_set_flags(), which in turned called
+		 * here.  uproc is already locked.  Do as little as possible
+		 * and get out.
+		 */
+		utask->state = UPTASK_QUIESCENT;
+		uproc->n_quiescent_threads++;
+		return UTRACE_ACTION_RESUME;
+	}
+
+	BUG_ON(utask->active_probe);
+	down_write(&uproc->rwsem);
+
+	/*
+	 * When a thread hits a breakpoint or single-steps, utrace calls
+	 * this quiesce callback before our signal callback.  We must
+	 * let uprobe_report_signal() handle the uprobe hit and THEN
+	 * quiesce, because (a) there's a chance that we're quiescing
+	 * in order to remove that very uprobe, and (b) there's a tiny
+	 * chance that even though that uprobe isn't marked for removal
+	 * now, it may be before all threads manage to quiesce.
+	 */
+	if (!utask->quiescing || utask_quiesce_pending_sigtrap(utask) == 1) {
+		clear_utrace_quiesce(utask);
+		goto done;
+	}
+
+	utask->state = UPTASK_QUIESCENT;
+	uproc->n_quiescent_threads++;
+	check_uproc_quiesced(uproc, tsk);
+done:
+	up_write(&uproc->rwsem);
+	return UTRACE_ACTION_RESUME;
+}
+
+/* Find a surviving thread in uproc.  Runs with uproc->rwsem locked. */
+static struct task_struct *find_surviving_thread(struct uprobe_process *uproc)
+{
+	struct uprobe_task *utask;
+
+	list_for_each_entry(utask, &uproc->thread_list, list)
+		return utask->tsk;
+	return NULL;
+}
+
+/*
+ * uproc's process is exiting or exec-ing, so zap all the (now irrelevant)
+ * probepoints.  Runs with uproc->rwsem write-locked.  Caller must ref-count
+ * uproc before calling this function, to ensure that uproc doesn't get
+ * freed in the middle of this.
+ */
+void uprobe_cleanup_process(struct uprobe_process *uproc)
+{
+	int i;
+	struct uprobe_probept *ppt;
+	struct hlist_node *pnode1, *pnode2;
+	struct hlist_head *head;
+	struct uprobe_kimg *uk, *unode;
+
+	for (i = 0; i < UPROBE_TABLE_SIZE; i++) {
+		head = &uproc->uprobe_table[i];
+		hlist_for_each_entry_safe(ppt, pnode1, pnode2, head, ut_node) {
+			if (ppt->state == UPROBE_INSERTING ||
+					ppt->state == UPROBE_REMOVING) {
+				/*
+				 * This task is (exec/exit)ing with
+				 * a [un]register_uprobe pending.
+				 * [un]register_uprobe will free ppt.
+				 */
+				ppt->state = UPROBE_DISABLED;
+				list_for_each_entry_safe(uk, unode,
+					       &ppt->uprobe_list, list)
+					uk->status = -ESRCH;
+				wake_up_all(&ppt->waitq);
+			} else if (ppt->state == UPROBE_BP_SET) {
+				list_for_each_entry_safe(uk, unode,
+					       &ppt->uprobe_list, list) {
+					list_del(&uk->list);
+					uprobe_free_kimg(uk);
+				}
+				uprobe_free_probept(ppt);
+			/* else */
+				/*
+				 * If ppt is UPROBE_DISABLED, assume that
+				 * [un]register_uprobe() has been notified
+				 * and will free it soon.
+				 */
+			}
+		}
+	}
+}
+
+/*
+ * Exit callback: The associated task/thread is exiting.
+ */
+static u32 uprobe_report_exit(struct utrace_attached_engine *engine,
+		struct task_struct *tsk, long orig_code, long *code)
+{
+	struct uprobe_task *utask;
+	struct uprobe_process *uproc;
+	struct uprobe_probept *ppt;
+	int utask_quiescing;
+
+	utask = rcu_dereference((struct uprobe_task *)engine->data);
+	uproc = utask->uproc;
+
+	ppt = utask->active_probe;
+	if (ppt) {
+		printk(KERN_WARNING "Task died at uprobe probepoint:"
+			"  pid/tgid = %d/%d, probepoint = %#lx\n",
+			tsk->pid, tsk->tgid, ppt->vaddr);
+		switch (utask->state) {
+		case UPTASK_PRE_SSTEP:
+		case UPTASK_SSTEP:
+		case UPTASK_POST_SSTEP:
+			mutex_unlock(&ppt->ssil_mutex);
+			break;
+		default:
+			break;
+		}
+		up_read(&uproc->rwsem);
+	}
+
+	down_write(&uproc->rwsem);
+	utask_quiescing = utask->quiescing;
+	list_del(&utask->list);
+	kfree(utask);
+
+	uproc->nthreads--;
+	if (uproc->nthreads) {
+		if (utask_quiescing)
+			/*
+			 * In case other threads are waiting for
+			 * us to quiesce...
+			 */
+			check_uproc_quiesced(uproc,
+				       find_surviving_thread(uproc));
+		up_write(&uproc->rwsem);
+	} else {
+		/*
+		 * We were the last remaining thread - clean up the uprobe
+		 * remnants a la unregister_uprobe(). We don't have to
+		 * remove the breakpoints, though.
+		 */
+		uprobe_get_process(uproc);
+		uprobe_cleanup_process(uproc);
+		up_write(&uproc->rwsem);
+		uprobe_put_process(uproc);
+	}
+
+	return UTRACE_ACTION_DETACH;
+}
+
+/*
+ * Clone callback: The current task has spawned a thread/process.
+ *
+ * NOTE: For now, we don't pass on uprobes from the parent to the
+ * child. We now do the necessary clearing of breakpoints in the
+ * child's address space.
+ *
+ * TODO:
+ *	- Provide option for child to inherit uprobes.
+ */
+static u32 uprobe_report_clone(struct utrace_attached_engine *engine,
+		struct task_struct *parent, unsigned long clone_flags,
+		struct task_struct *child)
+{
+	int len;
+	struct uprobe_process *uproc;
+	struct uprobe_task *ptask, *ctask;
+
+	ptask = rcu_dereference((struct uprobe_task *)engine->data);
+	uproc = ptask->uproc;
+
+	/*
+	 * Lock uproc so no new uprobes can be installed 'til all
+	 * report_clone activities are completed
+	 */
+	down_write(&uproc->rwsem);
+	get_task_struct(child);
+
+	if (clone_flags & CLONE_THREAD) {
+		/* New thread in the same process */
+		ctask = uprobe_add_task(child, uproc);
+		BUG_ON(!ctask);
+		if (IS_ERR(ctask)) {
+			put_task_struct(child);
+			up_write(&uproc->rwsem);
+			goto fail;
+		}
+		if (ctask)
+			uproc->nthreads++;
+		/*
+		 * FIXME: Handle the case where uproc is quiescing
+		 * (assuming it's possible to clone while quiescing).
+		 */
+	} else {
+		/*
+		 * New process spawned by parent.  Remove the probepoints
+		 * in the child's text.
+		 *
+		 * Its not necessary to quiesce the child as we are assured
+		 * by utrace that this callback happens *before* the child
+		 * gets to run userspace.
+		 *
+		 * We also hold the uproc->rwsem for the parent - so no
+		 * new uprobes will be registered 'til we return.
+		 */
+		int i;
+		struct uprobe_probept *ppt;
+		struct hlist_node *node;
+		struct hlist_head *head;
+
+		for (i = 0; i < UPROBE_TABLE_SIZE; i++) {
+			head = &uproc->uprobe_table[i];
+			hlist_for_each_entry(ppt, node, head, ut_node) {
+				len = set_orig_insn(ppt, child);
+				if (len != BP_INSN_SIZE) {
+					/* Ratelimit this? */
+					printk(KERN_ERR "Pid %d forked %d;"
+						" failed to remove probepoint"
+						" at %#lx in child\n",
+						parent->pid, child->pid,
+						ppt->vaddr);
+				}
+			}
+		}
+	}
+
+	put_task_struct(child);
+	up_write(&uproc->rwsem);
+
+fail:
+	return UTRACE_ACTION_RESUME;
+}
+
+/*
+ * Exec callback: The associated process called execve() or friends
+ *
+ * The new program is about to start running and so there is no
+ * possibility of a uprobe from the previous user address space
+ * to be hit.
+ *
+ * NOTE:
+ *	Typically, this process would have passed through the clone
+ *	callback, where the necessary action *should* have been
+ *	taken. However, if we still end up at this callback:
+ *		- We don't have to clear the uprobes - memory image
+ *		  will be overlaid.
+ *		- We have to free up uprobe resources associated with
+ *		  this process.
+ */
+static u32 uprobe_report_exec(struct utrace_attached_engine *engine,
+		struct task_struct *tsk, const struct linux_binprm *bprm,
+		struct pt_regs *regs)
+{
+	struct uprobe_process *uproc;
+	struct uprobe_task *utask;
+	int uproc_freed;
+
+	utask = rcu_dereference((struct uprobe_task *)engine->data);
+	uproc = utask->uproc;
+	uprobe_get_process(uproc);
+
+	down_write(&uproc->rwsem);
+	uprobe_cleanup_process(uproc);
+	up_write(&uproc->rwsem);
+
+	/* If any [un]register_uprobe is pending, it'll clean up. */
+	uproc_freed = uprobe_put_process(uproc);
+	return (uproc_freed ? UTRACE_ACTION_DETACH : UTRACE_ACTION_RESUME);
+}
+
+static const struct utrace_engine_ops uprobe_utrace_ops =
+{
+	.report_quiesce = uprobe_report_quiesce,
+	.report_signal = uprobe_report_signal,
+	.report_exit = uprobe_report_exit,
+	.report_clone = uprobe_report_clone,
+	.report_exec = uprobe_report_exec
+};
+
+#define arch_init_uprobes() 0
+
+static int __init init_uprobes(void)
+{
+	int i, err = 0;
+
+	for (i = 0; i < UPROBE_TABLE_SIZE; i++)
+		INIT_HLIST_HEAD(&uproc_table[i]);
+
+	p_uprobe_utrace_ops = &uprobe_utrace_ops;
+	err = arch_init_uprobes();
+	return err;
+}
+__initcall(init_uprobes);
+
+EXPORT_SYMBOL_GPL(register_uprobe);
+EXPORT_SYMBOL_GPL(unregister_uprobe);
_

Uprobes is enhanced to use "single-stepping out of line" (SSOL)
to avoid probe misses in multithreaded applications.  SSOL also
reduces probe overhead by 25-30%.

After a breakpoint has been hit and uprobes has run the probepoint's
handler(s), uprobes must execute the probed instruction in the
context of the probed process.  There are two commonly accepted
ways to do this:

o Single-stepping inline (SSIL): Temporarily replace the breakpoint
instruction with the original instruction; single-step the instruction;
restore the breakpoint instruction; and allow the thread to continue.
This method is typically used by interactive debuggers such as gdb,
and is also used in the uprobes base patch.  This approach doesn't
work acceptably for multithreaded programs, because while the
breakpoint is temporarily removed, other threads can sail past the
probepoint.  It also requires two writes to the probed process's
text for every probe hit.

o Single-stepping out of line (SSOL): Place a copy of the original
instruction somewhere in the probed process's address space;
single-step the copy; fix up the thread state as necessary; and allow
the thread to continue.  This approach is used by kprobes.

This implementation of SSOL entails two major components:

1) Allocation and management of an "SSOL area."  Before handling
the first probe hit, uprobes allocates a VM area in the probed
process's address space, and divides it into "instruction slots."
The first time a probepoint is hit, an instruction slot is allocated
to it and a copy of the probed instruction is placed there.  Multiple
threads can march through that probepoint simultaneously, all using
the same slot.  Currently, we allocate a VM area only for probed
processes (rather than at exec time for every process), its size
is one page, and it never grows.  Slots are recycled, as necessary,
on a least-recently-used basis.

2) Architecture-specific fix-ups for certain instructions.  If the
effect of an instruction depends on its address, the thread's
registers and/or stack must be fixed up after the instruction-copy
is single-stepped.  For i386 uprobes, the fixups were stolen from
i386 kprobes.

---

 Documentation/uprobes.txt  |   25 +--
 arch/i386/Kconfig          |    4 
 arch/i386/kernel/Makefile  |    1 
 arch/i386/kernel/uprobes.c |  135 +++++++++++++++++
 include/asm-i386/mmu.h     |    1 
 include/linux/uprobes.h    |   87 +++++++++++
 kernel/uprobes.c           |  342 +++++++++++++++++++++++++++++++++++++++++++--
 7 files changed, 569 insertions(+), 26 deletions(-)

diff -puN Documentation/uprobes.txt~2-uprobes-ssol Documentation/uprobes.txt
--- linux-2.6.21-rc6/Documentation/uprobes.txt~2-uprobes-ssol	2007-05-24 15:41:01.000000000 -0700
+++ linux-2.6.21-rc6-jimk/Documentation/uprobes.txt	2007-05-24 15:41:28.000000000 -0700
@@ -54,14 +54,13 @@ handler the addresses of the uprobe stru
 The handler may block, but keep in mind that the probed thread remains
 stopped while your handler runs.
 
-Next, Uprobes single-steps the probed instruction and resumes execution
-of the probed process at the instruction following the probepoint.
-[Note: In the base uprobes patch, we temporarily remove the breakpoint
-instruction, insert the original opcode, single-step the instruction
-"inline", and then replace the breakpoint.  This can create problems
-in a multithreaded application.  For example, it opens a time window
-during which another thread can sail right past the probepoint.
-This problem is resolved in the "single-stepping out of line" patch.]
+Next, Uprobes single-steps its copy of the probed instruction and
+resumes execution of the probed process at the instruction following
+the probepoint.  (It would be simpler to single-step the actual
+instruction in place, but then Uprobes would have to temporarily
+remove the breakpoint instruction.  This would create problems in a
+multithreaded application.  For example, it would open a time window
+when another thread could sail right past the probepoint.)
 
 1.2 The Role of Utrace
 
@@ -287,15 +286,15 @@ create a new set of uprobe objects.)
 8. Probe Overhead
 
 // TODO: Adjust as other architectures are tested.
-On a typical CPU in use in 2007, a uprobe hit takes 3 to 4
-microseconds to process.  Specifically, a benchmark that hits the same
-probepoint repeatedly, firing a simple handler each time, reports
-250,000 to 300,000 hits per second, depending on the architecture.
+On a typical CPU in use in 2007, a uprobe hit takes about 3
+microseconds to process.  Specifically, a benchmark that hits the
+same probepoint repeatedly, firing a simple handler each time, reports
+300,000 to 350,000 hits per second, depending on the architecture.
 
 Here are sample overhead figures (in usec) for different architectures.
 
 i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips
-4.2 usec/hit (single-stepping inline)
+2.9 usec/hit (single-stepping out of line)
 
 x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips
 // TODO
diff -puN arch/i386/Kconfig~2-uprobes-ssol arch/i386/Kconfig
--- linux-2.6.21-rc6/arch/i386/Kconfig~2-uprobes-ssol	2007-05-24 15:41:01.000000000 -0700
+++ linux-2.6.21-rc6-jimk/arch/i386/Kconfig	2007-05-24 15:41:28.000000000 -0700
@@ -87,6 +87,10 @@ config DMI
 	bool
 	default y
 
+config UPROBES_SSOL
+	bool
+	default y
+
 source "init/Kconfig"
 
 menu "Processor type and features"
diff -puN arch/i386/kernel/Makefile~2-uprobes-ssol arch/i386/kernel/Makefile
--- linux-2.6.21-rc6/arch/i386/kernel/Makefile~2-uprobes-ssol	2007-05-24 15:41:01.000000000 -0700
+++ linux-2.6.21-rc6-jimk/arch/i386/kernel/Makefile	2007-05-24 15:41:28.000000000 -0700
@@ -41,6 +41,7 @@ obj-$(CONFIG_EARLY_PRINTK)	+= early_prin
 obj-$(CONFIG_HPET_TIMER) 	+= hpet.o
 obj-$(CONFIG_K8_NB)		+= k8.o
 obj-$(CONFIG_STACK_UNWIND)	+= unwind.o
+obj-$(CONFIG_UPROBES)		+= uprobes.o
 
 obj-$(CONFIG_VMI)		+= vmi.o vmitime.o
 obj-$(CONFIG_PARAVIRT)		+= paravirt.o
diff -puN /dev/null arch/i386/kernel/uprobes.c
--- /dev/null	2007-05-25 07:05:01.545112516 -0700
+++ linux-2.6.21-rc6-jimk/arch/i386/kernel/uprobes.c	2007-05-24 15:41:28.000000000 -0700
@@ -0,0 +1,135 @@
+/*
+ *  Userspace Probes (UProbes)
+ *  arch/i386/kernel/uprobes.c
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2006
+ */
+#define UPROBES_IMPLEMENTATION 1
+#include <linux/uprobes.h>
+#include <linux/uaccess.h>
+
+/*
+ * Get an instruction slot from the process's SSOL area, containing the
+ * instruction at ppt's probepoint.  Point the eip at that slot, in
+ * preparation for single-stepping out of line.
+ */
+void uprobe_pre_ssout(struct uprobe_task *utask, struct uprobe_probept *ppt,
+		struct pt_regs *regs)
+{
+	struct uprobe_ssol_slot *slot;
+
+	slot = uprobe_get_insn_slot(ppt);
+	if (!slot) {
+		utask->doomed = 1;
+		return;
+	}
+	regs->eip = (long)slot->insn;
+	utask->singlestep_addr = regs->eip;
+}
+
+/*
+ * Called by uprobe_post_ssout() to adjust the return address
+ * pushed by a call instruction executed out-of-line.
+ */
+static void adjust_ret_addr(long esp, long correction,
+		struct uprobe_task *utask)
+{
+	int nleft;
+	long ra;
+
+	nleft = copy_from_user(&ra, (const void __user *) esp, 4);
+	if (unlikely(nleft != 0))
+		goto fail;
+	ra +=  correction;
+	nleft = copy_to_user((void __user *) esp, &ra, 4);
+	if (unlikely(nleft != 0))
+		goto fail;
+	return;
+
+fail:
+	printk(KERN_ERR
+		"uprobes: Failed to adjust return address after"
+		" single-stepping call instruction;"
+		" pid=%d, esp=%#lx\n", current->pid, esp);
+	utask->doomed = 1;
+}
+
+/*
+ * Called after single-stepping.  ppt->vaddr is the address of the
+ * instruction whose first byte has been replaced by the "int3"
+ * instruction.  To avoid the SMP problems that can occur when we
+ * temporarily put back the original opcode to single-step, we
+ * single-stepped a copy of the instruction.  The address of this
+ * copy is utask->singlestep_addr.
+ *
+ * This function prepares to return from the post-single-step
+ * interrupt.  We have to fix up the stack as follows:
+ *
+ * 0) Typically, the new eip is relative to the copied instruction.  We
+ * need to make it relative to the original instruction.  Exceptions are
+ * return instructions and absolute or indirect jump or call instructions.
+ *
+ * 1) If the single-stepped instruction was a call, the return address
+ * that is atop the stack is the address following the copied instruction.
+ * We need to make it the address following the original instruction.
+ */
+void uprobe_post_ssout(struct uprobe_task *utask, struct uprobe_probept *ppt,
+		struct pt_regs *regs)
+{
+	long next_eip = 0;
+	long copy_eip = utask->singlestep_addr;
+	long orig_eip = ppt->vaddr;
+
+	up_read(&ppt->slot->rwsem);
+
+	switch (ppt->insn[0]) {
+	case 0xc3:		/* ret/lret */
+	case 0xcb:
+	case 0xc2:
+	case 0xca:
+		next_eip = regs->eip;
+		/* eip is already adjusted, no more changes required*/
+		break;
+	case 0xe8:		/* call relative - Fix return addr */
+		adjust_ret_addr(regs->esp, (orig_eip - copy_eip), utask);
+		break;
+	case 0xff:
+		if ((ppt->insn[1] & 0x30) == 0x10) {
+			/* call absolute, indirect */
+			/* Fix return addr; eip is correct. */
+			next_eip = regs->eip;
+			adjust_ret_addr(regs->esp, (orig_eip - copy_eip),
+				utask);
+		} else if ((ppt->insn[1] & 0x31) == 0x20 ||
+			   (ppt->insn[1] & 0x31) == 0x21) {
+			/* jmp near or jmp far  absolute indirect */
+			/* eip is correct. */
+			next_eip = regs->eip;
+		}
+		break;
+	case 0xea:		/* jmp absolute -- eip is correct */
+		next_eip = regs->eip;
+		break;
+	default:
+		break;
+	}
+
+	if (next_eip)
+		regs->eip = next_eip;
+	else
+		regs->eip = orig_eip + (regs->eip - copy_eip);
+}
diff -puN include/asm-i386/mmu.h~2-uprobes-ssol include/asm-i386/mmu.h
--- linux-2.6.21-rc6/include/asm-i386/mmu.h~2-uprobes-ssol	2007-05-24 15:41:01.000000000 -0700
+++ linux-2.6.21-rc6-jimk/include/asm-i386/mmu.h	2007-05-24 15:41:28.000000000 -0700
@@ -13,6 +13,7 @@ typedef struct { 
 	struct semaphore sem;
 	void *ldt;
 	void *vdso;
+	void *uprobes_ssol_area;
 } mm_context_t;
 
 #endif
diff -puN include/linux/uprobes.h~2-uprobes-ssol include/linux/uprobes.h
--- linux-2.6.21-rc6/include/linux/uprobes.h~2-uprobes-ssol	2007-05-24 15:41:01.000000000 -0700
+++ linux-2.6.21-rc6-jimk/include/linux/uprobes.h	2007-05-24 15:41:28.000000000 -0700
@@ -87,6 +87,60 @@ enum uprobe_task_state {
 
 #define UPROBE_HASH_BITS 5
 #define UPROBE_TABLE_SIZE (1 << UPROBE_HASH_BITS)
+#define UINSNS_PER_PAGE (PAGE_SIZE/MAX_UINSN_BYTES)
+
+/* Used when deciding which instruction slot to steal. */
+enum uprobe_slot_state {
+	SSOL_FREE,
+	SSOL_ASSIGNED,
+	SSOL_BEING_STOLEN
+};
+
+/*
+ * For a uprobe_process that uses an SSOL area, there's an array of these
+ * objects matching the array of instruction slots in the SSOL area.
+ */
+struct uprobe_ssol_slot {
+	/* The slot in the SSOL area that holds the instruction-copy */
+	__user uprobe_opcode_t	*insn;
+
+	enum uprobe_slot_state state;
+
+	/* The probepoint that currently owns this slot */
+	struct uprobe_probept *owner;
+
+	/*
+	 * Read-locked when slot is in use during single-stepping.
+	 * Write-locked by stealing task.
+	 */
+	struct rw_semaphore rwsem;
+
+	/* Used for LRU heuristics.  If this overflows, it's OK. */
+	unsigned long last_used;
+};
+
+/*
+ * The per-process single-stepping out-of-line (SSOL) area
+ */
+struct uprobe_ssol_area {
+	/* Array of instruction slots in the vma we allocate */
+	__user uprobe_opcode_t *insn_area;
+
+	int nslots;
+	int nfree;
+
+	/* Array of slot objects, one per instruction slot */
+	struct uprobe_ssol_slot *slots;
+
+	/* lock held while finding a free slot */
+	spinlock_t lock;
+
+	/* Next slot to steal */
+	int next_slot;
+
+	/* Ensures 2 threads don't try to set up the vma simultaneously. */
+	struct mutex setup_mutex;
+};
 
 /*
  * uprobe_process -- not a user-visible struct.
@@ -136,6 +190,18 @@ struct uprobe_process {
 	 * since once the last thread has exited, the rest is academic.
 	 */
 	struct kref refcount;
+
+	/*
+	 * Manages slots for instruction-copies to be single-stepped
+	 * out of line.
+	 */
+	struct uprobe_ssol_area ssol_area;
+
+	/*
+	 * 1 to single-step out of line; 0 for inline.  This can drop to
+	 * 0 if we can't set up the SSOL area, but never goes from 0 to 1.
+	 */
+	int sstep_out_of_line;
 };
 
 /*
@@ -204,6 +270,19 @@ struct uprobe_probept {
 	 * prevent probe misses while the breakpoint is swapped out.
 	 */
 	struct mutex ssil_mutex;
+
+	/*
+	 * We put the instruction-copy here to single-step it.
+	 * We don't own it unless slot->owner points back to us.
+	 */
+	struct uprobe_ssol_slot *slot;
+
+	/*
+	 * Hold this while stealing an insn slot to ensure that no
+	 * other thread, having also hit this probepoint, simultaneously
+	 * steals a slot for it.
+	 */
+	struct mutex slot_mutex;
 };
 
 /*
@@ -248,6 +327,14 @@ struct uprobe_task {
 	int doomed;
 };
 
+#ifdef CONFIG_UPROBES_SSOL
+extern struct uprobe_ssol_slot *uprobe_get_insn_slot(struct uprobe_probept*);
+extern void uprobe_pre_ssout(struct uprobe_task*, struct uprobe_probept*,
+			struct pt_regs*);
+extern void uprobe_post_ssout(struct uprobe_task*, struct uprobe_probept*,
+			struct pt_regs*);
+#endif
+
 #endif	/* UPROBES_IMPLEMENTATION */
 
 #endif	/* _LINUX_UPROBES_H */
diff -puN kernel/uprobes.c~2-uprobes-ssol kernel/uprobes.c
--- linux-2.6.21-rc6/kernel/uprobes.c~2-uprobes-ssol	2007-05-24 15:41:01.000000000 -0700
+++ linux-2.6.21-rc6-jimk/kernel/uprobes.c	2007-05-24 15:56:25.000000000 -0700
@@ -33,6 +33,7 @@
 #include <linux/mm.h>
 #include <asm/tracehook.h>
 #include <asm/errno.h>
+#include <asm/mman.h>
 
 #define SET_ENGINE_FLAGS	1
 #define CLEAR_ENGINE_FLAGS	0
@@ -341,6 +342,7 @@ static int quiesce_all_threads(struct up
 static void uprobe_free_process(struct uprobe_process *uproc)
 {
 	struct uprobe_task *utask, *tmp;
+	struct uprobe_ssol_area *area = &uproc->ssol_area;
 
 	if (!hlist_unhashed(&uproc->hlist))
 		hlist_del(&uproc->hlist);
@@ -354,6 +356,8 @@ static void uprobe_free_process(struct u
 			utrace_detach(utask->tsk, utask->engine);
 		kfree(utask);
 	}
+	if (area->slots)
+		kfree(area->slots);
 	up_write(&uproc->rwsem);	// So kfree doesn't complain
 	kfree(uproc);
 }
@@ -496,6 +500,14 @@ static struct uprobe_process *uprobe_mk_
 	INIT_HLIST_NODE(&uproc->hlist);
 	uproc->tgid = p->tgid;
 
+	uproc->ssol_area.insn_area = NULL;
+	mutex_init(&uproc->ssol_area.setup_mutex);
+#ifdef CONFIG_UPROBES_SSOL
+	uproc->sstep_out_of_line = 1;
+#else
+	uproc->sstep_out_of_line = 0;
+#endif
+
 	/*
 	 * Create and populate one utask per thread in this process.  We
 	 * can't call uprobe_add_task() while holding tasklist_lock, so we:
@@ -545,6 +557,8 @@ static struct uprobe_probept *uprobe_add
 		return ERR_PTR(-ENOMEM);
 	init_waitqueue_head(&ppt->waitq);
 	mutex_init(&ppt->ssil_mutex);
+	mutex_init(&ppt->slot_mutex);
+	ppt->slot = NULL;
 
 	/* Connect to uk. */
 	INIT_LIST_HEAD(&ppt->uprobe_list);
@@ -566,6 +580,25 @@ static struct uprobe_probept *uprobe_add
 	return ppt;
 }
 
+/* ppt is going away.  Free its slot (if it owns one) in the SSOL area. */
+static void uprobe_free_slot(struct uprobe_probept *ppt)
+{
+	struct uprobe_ssol_slot *slot = ppt->slot;
+	if (slot) {
+		down_write(&slot->rwsem);
+		if (slot->owner == ppt) {
+			unsigned long flags;
+			struct uprobe_ssol_area *area = &ppt->uproc->ssol_area;
+			spin_lock_irqsave(&area->lock, flags);
+			slot->state = SSOL_FREE;
+			slot->owner = NULL;
+			area->nfree++;
+			spin_unlock_irqrestore(&area->lock, flags);
+		}
+		up_write(&slot->rwsem);
+	}
+}
+
 /*
  * Runs with ppt->uproc write-locked.  Frees ppt and decrements the ref count
  * on ppt->uproc (but ref count shouldn't hit 0).
@@ -573,6 +606,7 @@ static struct uprobe_probept *uprobe_add
 static void uprobe_free_probept(struct uprobe_probept *ppt)
 {
 	struct uprobe_process *uproc = ppt->uproc;
+	uprobe_free_slot(ppt);
 	hlist_del(&ppt->ut_node);
 	uproc->nppt--;
 	kfree(ppt);
@@ -598,7 +632,7 @@ static void purge_uprobe(struct uprobe_k
 		uprobe_free_probept(ppt);
 }
 
-/* Probed address must be in an executable VM area. */
+/* Probed address must be in an executable VM area, outside the SSOL area. */
 static int uprobe_validate_vaddr(struct task_struct *p, unsigned long vaddr)
 {
 	struct vm_area_struct *vma;
@@ -607,7 +641,8 @@ static int uprobe_validate_vaddr(struct 
 		return -EINVAL;
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, vaddr);
-	if (!vma || vaddr < vma->vm_start || !(vma->vm_flags & VM_EXEC)) {
+	if (!vma || vaddr < vma->vm_start || !(vma->vm_flags & VM_EXEC)
+	    || vma->vm_start == (unsigned long) mm->context.uprobes_ssol_area) {
 		up_read(&mm->mmap_sem);
 		return -EINVAL;
 	}
@@ -840,6 +875,256 @@ done:
 }
 
 /*
+ * Functions for allocation of the SSOL area, and the instruction slots
+ * therein
+ */
+
+/*
+ * Mmap a page for the uprobes SSOL area for the current process.
+ * Returns with mm->context.uprobes_ssol_area pointing at the page,
+ * or set to a negative errno.
+ * This approach was suggested by Roland McGrath.
+ */
+static void uprobe_setup_ssol_vma(void)
+{
+	unsigned long addr;
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+
+	down_write(&mm->mmap_sem);
+	/*
+	 * Find the end of the top mapping and skip a page.
+	 * If there is no space for PAGE_SIZE above
+	 * that, mmap will ignore our address hint.
+	 */
+	vma = rb_entry(rb_last(&mm->mm_rb), struct vm_area_struct, vm_rb);
+	addr = vma->vm_end + PAGE_SIZE;
+	addr = do_mmap_pgoff(NULL, addr, PAGE_SIZE, PROT_EXEC,
+					MAP_PRIVATE|MAP_ANONYMOUS, 0);
+	if (addr & ~PAGE_MASK) {
+		up_write(&mm->mmap_sem);
+		mm->context.uprobes_ssol_area = ERR_PTR(addr);
+		printk(KERN_ERR "Uprobes failed to allocate a vma for"
+			" pid/tgid %d/%d for single-stepping out of line.\n",
+			current->pid, current->tgid);
+		return;
+	}
+
+	vma = find_vma(mm, addr);
+	BUG_ON(!vma);
+	/* avoid vma copy on fork() and don't expand when mremap() */
+	vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND;
+
+	up_write(&mm->mmap_sem);
+	mm->context.uprobes_ssol_area = (void *)addr;
+}
+
+/*
+ * Initialize per-process area for single stepping out-of-line.
+ * Must be run by a thread in the probed process.  Returns with
+ * area->insn_area pointing to the initialized area, or set to a
+ * negative errno.
+ */
+static void uprobe_init_ssol(struct uprobe_ssol_area *area)
+{
+	struct uprobe_ssol_slot *slot;
+	int i;
+	char *slot_addr;	// Simplify pointer arithmetic
+
+	 /*
+	  * If we previously probed this process and then removed all
+	  * probes, the vma is still available to us.
+	  */
+	if (!current->mm->context.uprobes_ssol_area)
+		uprobe_setup_ssol_vma();
+	area->insn_area = (uprobe_opcode_t *)
+			 current->mm->context.uprobes_ssol_area;
+	if (IS_ERR(area->insn_area))
+		return;
+
+	area->slots = (struct uprobe_ssol_slot *)
+		kzalloc(sizeof(struct uprobe_ssol_slot) *
+					UINSNS_PER_PAGE, GFP_USER);
+	if (!area->slots) {
+		area->insn_area = ERR_PTR(-ENOMEM);
+		return;
+	}
+	area->nfree = area->nslots = UINSNS_PER_PAGE;
+	spin_lock_init(&area->lock);
+	area->next_slot = 0;
+	slot_addr = (char*) area->insn_area;
+	for (i = 0; i < UINSNS_PER_PAGE; i++) {
+		slot = &area->slots[i];
+		init_rwsem(&slot->rwsem);
+		slot->state = SSOL_FREE;
+		slot->owner = NULL;
+		slot->last_used = 0;
+		slot->insn = (__user uprobe_opcode_t *) slot_addr;
+		slot_addr += MAX_UINSN_BYTES;
+	}
+}
+
+/*
+ * Verify that the SSOL area has been set up for uproc.  Returns a
+ * pointer to the SSOL area, or a negative erro if we couldn't set it up.
+ */
+static __user uprobe_opcode_t
+			*uprobe_verify_ssol(struct uprobe_process *uproc)
+{
+	struct uprobe_ssol_area *area = &uproc->ssol_area;
+
+	if (unlikely(!area->insn_area)) {
+		/* First time through for this probed process */
+		static DEFINE_MUTEX(ssol_setup_mutex);
+		mutex_lock(&ssol_setup_mutex);
+		if (likely(!area->insn_area))
+			/* Nobody snuck in and set things up ahead of us. */
+			uprobe_init_ssol(area);
+		mutex_unlock(&ssol_setup_mutex);
+	}
+	return area->insn_area;
+}
+
+static inline int advance_slot(int slot, struct uprobe_ssol_area *area)
+{
+	return (slot + 1) % area->nslots;
+}
+
+/*
+ * Return the slot number of the least-recently-used slot in the
+ * neighborhood of area->next_slot.  Limit the number of slots we test
+ * to keep it fast.  Nobody dies if this isn't the best choice.
+ */
+static int uprobe_lru_insn_slot(struct uprobe_ssol_area *area)
+{
+#define MAX_LRU_TESTS 10
+	struct uprobe_ssol_slot *s;
+	int lru_slot = -1;
+	unsigned long lru_time = ULONG_MAX;
+	int nr_lru_tests = 0;
+	int slot = area->next_slot;
+	do {
+		s = &area->slots[slot];
+		if (likely(s->state == SSOL_ASSIGNED)) {
+			if( lru_time > s->last_used) {
+				lru_time = s->last_used;
+				lru_slot = slot;
+			}
+			if (++nr_lru_tests >= MAX_LRU_TESTS)
+				break;
+		}
+		slot = advance_slot(slot, area);
+	} while (slot != area->next_slot);
+
+	if (unlikely(lru_slot < 0))
+		/* All slots are in the act of being stolen.  Join the melee. */
+		return area->next_slot;
+	else
+		return lru_slot;
+}
+
+/*
+ * Choose an instruction slot and take it.  Choose a free slot if there is one.
+ * Otherwise choose the least-recently-used slot.  Returns with slot
+ * read-locked and containing the desired instruction.  Runs with
+ * ppt->slot_mutex locked.
+ */
+static struct uprobe_ssol_slot
+		*uprobe_take_insn_slot(struct uprobe_probept *ppt)
+{
+	struct uprobe_process *uproc = ppt->uproc;
+	struct uprobe_ssol_area *area = &uproc->ssol_area;
+	struct uprobe_ssol_slot *s;
+	int len, slot;
+	unsigned long flags;
+
+	spin_lock_irqsave(&area->lock, flags);
+
+	if (area->nfree) {
+		for (slot = 0; slot < area->nslots; slot++) {
+			if (area->slots[slot].state == SSOL_FREE) {
+				area->nfree--;
+				goto found_slot;
+			}
+		}
+		/* Shouldn't get here.  Fix nfree and get on with life. */
+		area->nfree = 0;
+	}
+	slot = uprobe_lru_insn_slot(area);
+
+found_slot:
+	area->next_slot = advance_slot(slot, area);
+	s = &area->slots[slot];
+	s->state = SSOL_BEING_STOLEN;
+
+	spin_unlock_irqrestore(&area->lock, flags);
+
+	/* Wait for current users of slot to finish. */
+	down_write(&s->rwsem);
+	ppt->slot = s;
+	s->owner = ppt;
+	s->last_used = jiffies;
+	s->state = SSOL_ASSIGNED;
+	/* Copy the original instruction to the chosen slot. */
+	len = access_process_vm(current, (unsigned long)s->insn,
+					 ppt->insn, MAX_UINSN_BYTES, 1);
+        if (unlikely(len < MAX_UINSN_BYTES)) {
+		up_write(&s->rwsem);
+		printk(KERN_ERR "Failed to copy instruction at %#lx"
+			" to SSOL area (%#lx)\n", ppt->vaddr,
+			(unsigned long) area->slots);
+		return NULL;
+	}
+	/* Let other threads single-step in this slot. */
+	downgrade_write(&s->rwsem);
+	return s;
+}
+
+/* ppt doesn't own a slot.  Get one for ppt, and return it read-locked. */
+static struct uprobe_ssol_slot
+		*uprobe_find_insn_slot(struct uprobe_probept *ppt)
+{
+	struct uprobe_ssol_slot *slot;
+
+	mutex_lock(&ppt->slot_mutex);
+	slot = ppt->slot;
+	if (unlikely(slot && slot->owner == ppt)) {
+		/* Looks like another thread snuck in and got a slot for us. */
+		down_read(&slot->rwsem);
+		if (likely(slot->owner == ppt)) {
+			slot->last_used = jiffies;
+			mutex_unlock(&ppt->slot_mutex);
+			return slot;
+		}
+		/* ... but then somebody stole it. */
+		up_read(&slot->rwsem);
+	}
+	slot = uprobe_take_insn_slot(ppt);
+	mutex_unlock(&ppt->slot_mutex);
+	return slot;
+}
+
+/*
+ * Ensure that ppt owns an instruction slot for single-stepping.
+ * Returns with the slot read-locked and ppt->slot pointing at it.
+ */
+struct uprobe_ssol_slot *uprobe_get_insn_slot(struct uprobe_probept *ppt)
+{
+	struct uprobe_ssol_slot *slot = ppt->slot;
+
+	if (unlikely(!slot))
+		return uprobe_find_insn_slot(ppt);
+
+	down_read(&slot->rwsem);
+	if (unlikely(slot->owner != ppt)) {
+		up_read(&slot->rwsem);
+		return uprobe_find_insn_slot(ppt);
+	}
+	slot->last_used = jiffies;
+	return slot;
+}
+
+/*
  * utrace engine report callbacks
  */
 
@@ -939,6 +1224,8 @@ static inline void uprobe_post_ssin(stru
 	mutex_unlock(&ppt->ssil_mutex);
 }
 
+/* uprobe_pre_ssout() and uprobe_post_ssout() are architecture-specific. */
+
 /*
  * Signal callback:
  *
@@ -982,7 +1269,19 @@ static u32 uprobe_report_signal(struct u
 	if (action != UTRACE_SIGNAL_CORE || info->si_signo != SIGTRAP)
 		goto no_interest;
 
-	uproc = utask->uproc;
+	/*
+	 * Set up the SSOL area if it's not already there.  We do this
+	 * here because we have to do it before handling the first
+	 * probepoint hit, the probed process has to do it, and this may
+	 * be the first time our probed process runs uprobes code.
+	 */
+ 	uproc = utask->uproc;
+#ifdef CONFIG_UPROBES_SSOL
+	if (uproc->sstep_out_of_line &&
+			unlikely(IS_ERR(uprobe_verify_ssol(uproc))))
+		uproc->sstep_out_of_line = 0;
+#endif
+
 	switch (utask->state) {
 	case UPTASK_RUNNING:
 		down_read(&uproc->rwsem);
@@ -1005,7 +1304,12 @@ static u32 uprobe_report_signal(struct u
 		}
 
 		utask->state = UPTASK_PRE_SSTEP;
-		uprobe_pre_ssin(utask, ppt, regs);
+#ifdef CONFIG_UPROBES_SSOL
+		if (uproc->sstep_out_of_line)
+			uprobe_pre_ssout(utask, ppt, regs);
+		else
+#endif
+			uprobe_pre_ssin(utask, ppt, regs);
 		if (unlikely(utask->doomed))
 			do_exit(SIGSEGV);
 		utask->state = UPTASK_SSTEP;
@@ -1020,7 +1324,12 @@ static u32 uprobe_report_signal(struct u
 		ppt = utask->active_probe;
 		BUG_ON(!ppt);
 		utask->state = UPTASK_POST_SSTEP;
-		uprobe_post_ssin(utask, ppt);
+#ifdef CONFIG_UPROBES_SSOL
+		if (uproc->sstep_out_of_line)
+			uprobe_post_ssout(utask, ppt, regs);
+		else
+#endif
+			uprobe_post_ssin(utask, ppt);
 		if (unlikely(utask->doomed))
 			do_exit(SIGSEGV);
 
@@ -1200,14 +1509,21 @@ static u32 uprobe_report_exit(struct utr
 		printk(KERN_WARNING "Task died at uprobe probepoint:"
 			"  pid/tgid = %d/%d, probepoint = %#lx\n",
 			tsk->pid, tsk->tgid, ppt->vaddr);
-		switch (utask->state) {
-		case UPTASK_PRE_SSTEP:
-		case UPTASK_SSTEP:
-		case UPTASK_POST_SSTEP:
-			mutex_unlock(&ppt->ssil_mutex);
-			break;
-		default:
-			break;
+		/* Mutex cleanup depends on where we died and SSOL vs. SSIL. */
+		if (uproc->sstep_out_of_line) {
+			if (utask->state == UPTASK_SSTEP
+					&& ppt->slot && ppt->slot->owner == ppt)
+				up_read(&ppt->slot->rwsem);
+		} else {
+			switch (utask->state) {
+			case UPTASK_PRE_SSTEP:
+			case UPTASK_SSTEP:
+			case UPTASK_POST_SSTEP:
+				mutex_unlock(&ppt->ssil_mutex);
+				break;
+			default:
+				break;
+			}
 		}
 		up_read(&uproc->rwsem);
 	}
_

This patch implements user-space return probes (uretprobes). Similar to
kretprobes, uretprobes works by bouncing a probed function's return off
a known trampoline, at which time the user-specified handler is run.

Uretprobes works by first inserting a uprobe at the entry to the
specified function.  When the function is called and the probe is hit,
uretprobes makes a copy of the return address, then replaces the
return address with the address of the trampoline.

When the function returns, control passes to the trampoline and
then to uprobes.  Uprobes runs the user-specified handler, then
allows the thread to continue at the real return address.

Uprobes uses one slot of the SSOL area for the trampoline.

---

 Documentation/uprobes.txt  |  247 +++++++++++++++++++++++++++++++++++++----
 arch/i386/Kconfig          |    4 
 arch/i386/kernel/uprobes.c |   40 ++++++
 include/asm-i386/uprobes.h |   10 +
 include/linux/uprobes.h    |   42 ++++++-
 kernel/uprobes.c           |  265 +++++++++++++++++++++++++++++++++++++++++++--
 6 files changed, 568 insertions(+), 40 deletions(-)

diff -puN Documentation/uprobes.txt~3-uretprobes Documentation/uprobes.txt
--- linux-2.6.21-rc6/Documentation/uprobes.txt~3-uretprobes	2007-05-25 13:42:45.000000000 -0700
+++ linux-2.6.21-rc6-jimk/Documentation/uprobes.txt	2007-05-25 14:43:48.000000000 -0700
@@ -3,7 +3,7 @@ Author	: Jim Keniston <jkenisto@us.ibm.c
 
 CONTENTS
 
-1. Concepts
+1. Concepts: Uprobes, Return Probes
 2. Architectures Supported
 3. Configuring Uprobes
 4. API Reference
@@ -14,15 +14,22 @@ CONTENTS
 9. TODO
 10. Uprobes Team
 11. Uprobes Example
+12. Uretprobes Example
 
-1. Concepts
+1. Concepts: Uprobes, Return Probes
 
 Uprobes enables you to dynamically break into any routine in a
 user application and collect debugging and performance information
 non-disruptively. You can trap at any code address, specifying a
 kernel handler routine to be invoked when the breakpoint is hit.
 
-The registration function, register_uprobe(), specifies which
+There are currently two types of user-space probes: uprobes and
+uretprobes (also called return probes).  A uprobe can be inserted on
+any instruction in the application's virtual address space.  A return
+probe fires when a specified user function returns.  These two probe
+types are discussed in more detail later.
+
+A registration function such as register_uprobe() specifies which
 process is to be probed, where the probe is to be inserted, and what
 handler is to be called when the probe is hit.
 
@@ -62,6 +69,10 @@ remove the breakpoint instruction.  This
 multithreaded application.  For example, it would open a time window
 when another thread could sail right past the probepoint.)
 
+Instruction copies to be single-stepped are stored in a per-process
+"single-step out of line (SSOL) area," which is a little VM area
+created by Uprobes in each probed process's address space.
+
 1.2 The Role of Utrace
 
 When a probe is registered on a previously unprobed process,
@@ -73,7 +84,23 @@ Uprobes of breakpoint and single-step tr
 events in the lifetime of the probed process, such as fork, clone,
 exec, and exit.
 
-1.3 Multithreaded Applications
+1.3 How Does a Return Probe Work?
+
+When you call register_uretprobe(), Uprobes establishes a uprobe
+at the entry to the function.  When the probed function is called
+and this probe is hit, Uprobes saves a copy of the return address,
+and replaces the return address with the address of a "trampoline"
+-- a piece of code that contains a breakpoint instruction.
+
+When the probed function executes its return instruction, control
+passes to the trampoline and that breakpoint is hit.  Uprobes's
+trampoline handler calls the user-specified handler associated with the
+uretprobe, then sets the saved instruction pointer to the saved return
+address, and that's where execution resumes upon return from the trap.
+
+The trampoline is stored in the SSOL area.
+
+1.4 Multithreaded Applications
 
 Uprobes supports the probing of multithreaded applications.  Uprobes
 imposes no limit on the number of threads in a probed application.
@@ -95,7 +122,8 @@ after the breakpoint has been inserted/r
 
 2. Architectures Supported
 
-Uprobes is implemented on the following architectures:
+Uprobes and uretprobes are implemented on the following
+architectures:
 
 - i386
 - x86_64 (AMD-64, EM64T)	// in progress
@@ -119,10 +147,10 @@ unloading" (CONFIG_MODULE_UNLOAD) are se
 
 4. API Reference
 
-The Uprobes API includes two functions, register_uprobe() and
-unregister_uprobe().  Here are terse, mini-man-page specifications for
-these functions and the associated probe handlers that you'll write.
-See the latter half of this document for an example.
+The Uprobes API includes a "register" function and an "unregister"
+function for each type of probe.  Here are terse, mini-man-page
+specifications for these functions and the associated probe handlers
+that you'll write.  See the latter half of this document for examples.
 
 4.1 register_uprobe
 
@@ -143,12 +171,41 @@ Called with u pointing to the uprobe ass
 and regs pointing to the struct containing the registers saved when
 the breakpoint was hit.
 
-4.2 unregister_uprobe
+4.2 register_uretprobe
+
+#include <linux/uprobes.h>
+int register_uretprobe(struct uretprobe *rp);
+
+Establishes a return probe in the process whose pid is rp->u.pid for
+the function whose address is rp->u.vaddr.  When that function returns,
+Uprobes calls rp->handler.
+
+register_uretprobe() returns 0 on success, or a negative errno
+otherwise.
+
+User's return-probe handler (rp->handler):
+#include <linux/uprobes.h>
+#include <linux/ptrace.h>
+void uretprobe_handler(struct uretprobe_instance *ri, struct pt_regs *regs);
+
+regs is as described for the user's uprobe handler.  ri points to
+the uretprobe_instance object associated with the particular function
+instance that is currently returning.  The following fields in that
+object may be of interest:
+- ret_addr: the return address
+- rp: points to the corresponding uretprobe object
+
+In ptrace.h, the regs_return_value(regs) macro provides a simple
+abstraction to extract the return value from the appropriate register
+as defined by the architecture's ABI.
+
+4.3 unregister_*probe
 
 #include <linux/uprobes.h>
 void unregister_uprobe(struct uprobe *u);
+void unregister_uretprobe(struct uretprobe *rp);
 
-Removes the specified probe.  unregister_uprobe() can be called
+Removes the specified probe.  The unregister function can be called
 at any time after the probe has been registered.
 
 5. Uprobes Features and Limitations
@@ -157,12 +214,13 @@ The user is expected to assign values on
 of struct uprobe: pid, vaddr, and handler.  Other members are reserved
 for Uprobes' use.  Uprobes may produce unexpected results if you:
 - assign non-zero values to reserved members of struct uprobe;
-- change the contents of a uprobe object while it is registered; or
-- attempt to register a uprobe that is already registered.
-
-Uprobes allows any number of probes at a particular address.  For a
-particular probepoint, handlers are run in the order in which they
-were registered.
+- change the contents of a uprobe or uretprobe object while it is
+registered; or
+- attempt to register a uprobe or uretprobe that is already registered.
+
+Uprobes allows any number of probes (uprobes and/or uretprobes)
+at a particular address.  For a particular probepoint, handlers are
+run in the order in which they were registered.
 
 Any number of kernel modules may probe a particular process
 simultaneously, and a particular module may probe any number of
@@ -172,8 +230,8 @@ Probes are shared by all threads in a pr
 threads).
 
 If a probed process exits or execs, Uprobes automatically unregisters
-all uprobes associated with that process.  Subsequent attempts to
-unregister these probes will be treated as no-ops.
+all uprobes and uretprobes associated with that process.  Subsequent
+attempts to unregister these probes will be treated as no-ops.
 
 On the other hand, if a probed memory area is removed from the
 process's virtual memory map (e.g., via dlclose(3) or munmap(2)),
@@ -206,6 +264,16 @@ to install a bug fix or to inject faults
 course, has no way to distinguish the deliberately injected faults
 from the accidental ones.  Don't drink and probe.
 
+Since a return probe is implemented by replacing the return
+address with the trampoline's address, stack backtraces and calls
+to __builtin_return_address() will typically yield the trampoline's
+address instead of the real return address for uretprobed functions.
+
+If the number of times a function is called does not match the
+number of times it returns (e.g., if a function exits via longjmp()),
+registering a return probe on that function may produce undesirable
+results.
+
 When you register the first probe at probepoint or unregister the
 last probe probe at a probepoint, Uprobes asks Utrace to "quiesce"
 the probed process so that Uprobes can insert or remove the breakpoint
@@ -227,14 +295,14 @@ Uprobes is intended to interoperate usef
 Documentation/kprobes.txt).  For example, an instrumentation module
 can make calls to both the Kprobes API and the Uprobes API.
 
-A uprobe handler can register or unregister kprobes, jprobes,
-and kretprobes.  On the other hand, a kprobe, jprobe, or kretprobe
-handler must not sleep, and therefore cannot register or unregister
-any of these types of probes.  (Ideas for removing this restriction
-are welcome.)
+A uprobe or uretprobe handler can register or unregister kprobes,
+jprobes, and kretprobes.  On the other hand, a kprobe, jprobe, or
+kretprobe handler must not sleep, and therefore cannot register or
+unregister any of these types of probes.  (Ideas for removing this
+restriction are welcome.)
 
-Note that the overhead of a uprobe hit is several times that of a
-kprobe hit.
+Note that the overhead of a u[ret]probe hit is several times that of
+a kprobe hit.
 
 7. Interoperation with Utrace
 
@@ -281,7 +349,7 @@ be probed.
 - In your report_quiesce callback, register the desired probes.
 (Note that you cannot use the same probe object for both parent
 and child.  If you want to duplicate the probepoints, you must
-create a new set of uprobe objects.)
+create a new set of u[ret]probe objects.)
 
 8. Probe Overhead
 
@@ -290,11 +358,15 @@ On a typical CPU in use in 2007, a uprob
 microseconds to process.  Specifically, a benchmark that hits the
 same probepoint repeatedly, firing a simple handler each time, reports
 300,000 to 350,000 hits per second, depending on the architecture.
+A return-probe hit typically takes 50% longer than a uprobe hit.
+When you have a return probe set on a function, adding a uprobe at
+the entry to that function adds essentially no overhead.
 
 Here are sample overhead figures (in usec) for different architectures.
+u = uprobe; r = return probe; ur = uprobe + return probe
 
 i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips
-2.9 usec/hit (single-stepping out of line)
+u = 2.9 usec; r = 4.7 usec; ur = 4.7 usec
 
 x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips
 // TODO
@@ -422,3 +494,122 @@ is called.  To turn off probing, remove 
 
 In /var/log/messages and on the console, you will see a message of the
 form "Probepoint was hit 5 times".
+
+12. Uretprobes Example
+
+Here's a sample kernel module showing the use of a return probe to
+report a function's return values.
+----- cut here -----
+/* uretprobe_example.c */
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/uprobes.h>
+#include <linux/ptrace.h>
+
+/*
+ * Usage:
+ * insmod uretprobe_example.ko pid=<pid> func=<addr> [verbose=0]
+ * where <pid> identifies the probed process, and <addr> is the virtual
+ * address of the probed function.
+ */
+
+static int pid = 0;
+module_param(pid, int, 0);
+MODULE_PARM_DESC(pid, "pid");
+
+static int verbose = 1;
+module_param(verbose, int, 0);
+MODULE_PARM_DESC(verbose, "verbose");
+
+static long func = 0;
+module_param(func, long, 0);
+MODULE_PARM_DESC(func, "func");
+
+static int ncall, nret;
+static struct uprobe usp;
+static struct uretprobe rp;
+
+static void uprobe_handler(struct uprobe *u, struct pt_regs *regs)
+{
+	ncall++;
+	if (verbose)
+		printk(KERN_INFO "Function at %#lx called\n", u->vaddr);
+}
+
+static void uretprobe_handler(struct uretprobe_instance *ri,
+	struct pt_regs *regs)
+{
+	nret++;
+	if (verbose)
+		printk(KERN_INFO "Function at %#lx returns %#lx\n",
+			ri->rp->u.vaddr, regs_return_value(regs));
+}
+
+int __init init_module(void)
+{
+	int ret;
+
+	/* Register the entry probe. */
+	usp.pid = pid;
+	usp.vaddr = func;
+	usp.handler = uprobe_handler;
+	printk(KERN_INFO "Registering uprobe on pid %d, vaddr %#lx\n",
+		usp.pid, usp.vaddr);
+	ret = register_uprobe(&usp);
+	if (ret != 0) {
+		printk(KERN_ERR "register_uprobe() failed, returned %d\n", ret);
+		return -1;
+	}
+
+	/* Register the return probe. */
+	rp.u.pid = pid;
+	rp.u.vaddr = func;
+	rp.handler = uretprobe_handler;
+	printk(KERN_INFO "Registering return probe on pid %d, vaddr %#lx\n",
+		rp.u.pid, rp.u.vaddr);
+	ret = register_uretprobe(&rp);
+	if (ret != 0) {
+		printk(KERN_ERR "register_uretprobe() failed, returned %d\n",
+			ret);
+		unregister_uprobe(&usp);
+		return -1;
+	}
+	return 0;
+}
+
+void __exit cleanup_module(void)
+{
+	printk(KERN_INFO "Unregistering probes on pid %d, vaddr %#lx\n",
+		usp.pid, usp.vaddr);
+	printk(KERN_INFO "%d calls, %d returns\n", ncall, nret);
+	unregister_uprobe(&usp);
+	unregister_uretprobe(&rp);
+}
+MODULE_LICENSE("GPL");
+----- cut here -----
+
+Build the kernel module as shown in the above uprobe example.
+
+$ nm -p myprog | awk '$3=="myfunc"'
+080484a8 T myfunc
+$ ./myprog &
+$ ps
+  PID TTY          TIME CMD
+ 4367 pts/3    00:00:00 bash
+ 9156 pts/3    00:00:00 myprog
+ 9157 pts/3    00:00:00 ps
+$ su -
+...
+# insmod uretprobe_example.ko pid=9156 func=0x080484a8
+
+In /var/log/messages and on the console, you will see messages such
+as the following:
+kernel: Function at 0x80484a8 called
+kernel: Function at 0x80484a8 returns 0x3
+To turn off probing, remove the module:
+
+# rmmod uretprobe_example
+
+In /var/log/messages and on the console, you will see a message of the
+form "73 calls, 73 returns".
diff -puN arch/i386/Kconfig~3-uretprobes arch/i386/Kconfig
--- linux-2.6.21-rc6/arch/i386/Kconfig~3-uretprobes	2007-05-25 13:42:45.000000000 -0700
+++ linux-2.6.21-rc6-jimk/arch/i386/Kconfig	2007-05-25 14:43:48.000000000 -0700
@@ -91,6 +91,10 @@ config UPROBES_SSOL
 	bool
 	default y
 
+config URETPROBES
+	bool
+	default y
+
 source "init/Kconfig"
 
 menu "Processor type and features"
diff -puN arch/i386/kernel/uprobes.c~3-uretprobes arch/i386/kernel/uprobes.c
--- linux-2.6.21-rc6/arch/i386/kernel/uprobes.c~3-uretprobes	2007-05-25 13:42:45.000000000 -0700
+++ linux-2.6.21-rc6-jimk/arch/i386/kernel/uprobes.c	2007-05-25 14:43:48.000000000 -0700
@@ -133,3 +133,43 @@ void uprobe_post_ssout(struct uprobe_tas
 	else
 		regs->eip = orig_eip + (regs->eip - copy_eip);
 }
+
+/*
+ * Replace the return address with the trampoline address.  Returns
+ * the original return address.
+ */
+unsigned long arch_hijack_uret_addr(unsigned long trampoline_address,
+		struct pt_regs *regs, struct uprobe_task *utask)
+{
+	int nleft;
+	unsigned long orig_ret_addr;
+#define RASIZE (sizeof(unsigned long))
+
+	nleft = copy_from_user(&orig_ret_addr,
+		       (const void __user *)regs->esp, RASIZE);
+	if (unlikely(nleft != 0))
+		return 0;
+
+	if (orig_ret_addr == trampoline_address)
+		/*
+		 * There's another uretprobe on this function, and it was
+		 * processed first, so the return address has already
+		 * been hijacked.
+		 */
+		return orig_ret_addr;
+
+	nleft = copy_to_user((void __user *)regs->esp,
+		       &trampoline_address, RASIZE);
+	if (unlikely(nleft != 0)) {
+		if (nleft != RASIZE) {
+			printk(KERN_ERR "uretprobe_entry_handler: "
+					"return address partially clobbered -- "
+					"pid=%d, %%esp=%#lx, %%eip=%#lx\n",
+					current->pid, regs->esp, regs->eip);
+			utask->doomed = 1;
+		} /* else nothing written, so no harm */
+		return 0;
+	}
+	return orig_ret_addr;
+}
+
diff -puN include/asm-i386/uprobes.h~3-uretprobes include/asm-i386/uprobes.h
--- linux-2.6.21-rc6/include/asm-i386/uprobes.h~3-uretprobes	2007-05-25 13:42:45.000000000 -0700
+++ linux-2.6.21-rc6-jimk/include/asm-i386/uprobes.h	2007-05-25 14:43:48.000000000 -0700
@@ -33,6 +33,7 @@ typedef u8 uprobe_opcode_t;
 #define ARCH_BP_INST_PTR(inst_ptr)	(inst_ptr - BP_INSN_SIZE)
 
 struct uprobe_probept;
+struct uprobe_task;
 
 /* Caller prohibits probes on int3.  We currently allow everything else. */
 static inline int arch_validate_probed_insn(struct uprobe_probept *ppt)
@@ -51,4 +52,13 @@ static inline void arch_reset_ip_for_sst
 	regs->eip -= BP_INSN_SIZE;
 }
 
+static inline void arch_restore_uret_addr(unsigned long ret_addr,
+		struct pt_regs *regs)
+{
+	regs->eip = ret_addr;
+}
+
+extern unsigned long arch_hijack_uret_addr(unsigned long trampoline_addr,
+		struct pt_regs *regs, struct uprobe_task *utask);
+
 #endif				/* _ASM_UPROBES_H */
diff -puN include/linux/uprobes.h~3-uretprobes include/linux/uprobes.h
--- linux-2.6.21-rc6/include/linux/uprobes.h~3-uretprobes	2007-05-25 13:42:45.000000000 -0700
+++ linux-2.6.21-rc6-jimk/include/linux/uprobes.h	2007-05-25 14:48:52.000000000 -0700
@@ -21,6 +21,7 @@
  * Copyright (C) IBM Corporation, 2006
  */
 #include <linux/types.h>
+#include <linux/list.h>
 
 struct pt_regs;
 
@@ -43,6 +44,19 @@ struct uprobe {
 	void *kdata;
 };
 
+struct uretprobe_instance;
+
+struct uretprobe {
+	struct uprobe u;
+	void (*handler)(struct uretprobe_instance*, struct pt_regs*);
+};
+
+struct uretprobe_instance {
+	struct uretprobe *rp;
+	unsigned long ret_addr;
+	struct hlist_node hlist;
+};
+
 #ifdef CONFIG_UPROBES
 extern int register_uprobe(struct uprobe *u);
 extern void unregister_uprobe(struct uprobe *u);
@@ -56,9 +70,22 @@ static inline void unregister_uprobe(str
 }
 #endif	/* CONFIG_UPROBES */
 
+#if defined(CONFIG_UPROBES) && defined(CONFIG_URETPROBES)
+extern int register_uretprobe(struct uretprobe *rp);
+extern void unregister_uretprobe(struct uretprobe *rp);
+#else
+static inline int register_uretprobe(struct uretprobe *u)
+{
+	return -ENOSYS;
+}
+static inline void unregister_uretprobe(struct uretprobe *u)
+{
+}
+#endif
+
+
 #ifdef UPROBES_IMPLEMENTATION
 
-#include <linux/list.h>
 #include <linux/mutex.h>
 #include <linux/rwsem.h>
 #include <linux/wait.h>
@@ -80,6 +107,7 @@ enum uprobe_task_state {
 	UPTASK_SLEEPING,	// used when task may not be able to quiesce
 	UPTASK_RUNNING,
 	UPTASK_BP_HIT,
+	UPTASK_TRAMPOLINE_HIT,
 	UPTASK_PRE_SSTEP,
 	UPTASK_SSTEP,
 	UPTASK_POST_SSTEP
@@ -93,7 +121,8 @@ enum uprobe_task_state {
 enum uprobe_slot_state {
 	SSOL_FREE,
 	SSOL_ASSIGNED,
-	SSOL_BEING_STOLEN
+	SSOL_BEING_STOLEN,
+	SSOL_RESERVED		// e.g., for uretprobe trampoline
 };
 
 /*
@@ -185,12 +214,16 @@ struct uprobe_process {
 	 * We won't free the uprobe_process while...
 	 * - any register/unregister operations on it are in progress; or
 	 * - uprobe_table[] is not empty; or
-	 * - any tasks are SLEEPING in the waitq.
+	 * - any tasks are SLEEPING in the waitq; or
+	 * - any uretprobe_instances are outstanding.
 	 * refcount reflects this.  We do NOT ref-count tasks (threads),
 	 * since once the last thread has exited, the rest is academic.
 	 */
 	struct kref refcount;
 
+	/* Return-probed functions return via this trampoline. */
+	__user uprobe_opcode_t *uretprobe_trampoline_addr;
+
 	/*
 	 * Manages slots for instruction-copies to be single-stepped
 	 * out of line.
@@ -325,6 +358,9 @@ struct uprobe_task {
 	 * text or stack corrupted.  Kill task ASAP.
 	 */
 	int doomed;
+
+	/* LIFO -- active instances */
+	struct hlist_head uretprobe_instances;
 };
 
 #ifdef CONFIG_UPROBES_SSOL
diff -puN kernel/uprobes.c~3-uretprobes kernel/uprobes.c
--- linux-2.6.21-rc6/kernel/uprobes.c~3-uretprobes	2007-05-25 13:42:45.000000000 -0700
+++ linux-2.6.21-rc6-jimk/kernel/uprobes.c	2007-05-25 14:43:48.000000000 -0700
@@ -42,6 +42,20 @@ extern int access_process_vm(struct task
 	void *buf, int len, int write);
 static int utask_fake_quiesce(struct uprobe_task *utask);
 
+static void uretprobe_handle_entry(struct uprobe *u, struct pt_regs *regs,
+	struct uprobe_task *utask);
+static void uretprobe_handle_return(struct pt_regs *regs,
+	struct uprobe_task *utask);
+static void uretprobe_set_trampoline(struct uprobe_process *uproc);
+static void zap_uretprobe_instances(struct uprobe *u,
+	struct uprobe_process *uproc);
+
+typedef void (*uprobe_handler_t)(struct uprobe*, struct pt_regs*);
+#define URETPROBE_HANDLE_ENTRY ((uprobe_handler_t)-1L)
+#define is_uretprobe(u) (u->handler == URETPROBE_HANDLE_ENTRY)
+/* Point utask->active_probe at this while running uretprobe handler. */
+static struct uprobe_probept uretprobe_trampoline_dummy_probe;
+
 /* Table of currently probed processes, hashed by tgid. */
 static struct hlist_head uproc_table[UPROBE_TABLE_SIZE];
 
@@ -338,6 +352,22 @@ static int quiesce_all_threads(struct up
 	return survivors;
 }
 
+/* Called with utask->uproc write-locked. */
+static void uprobe_free_task(struct uprobe_task *utask)
+{
+	struct uretprobe_instance *ri;
+	struct hlist_node *r1, *r2;
+
+	list_del(&utask->list);
+	hlist_for_each_entry_safe(ri, r1, r2, &utask->uretprobe_instances,
+			hlist) {
+		hlist_del(&ri->hlist);
+		kfree(ri);
+		uprobe_put_process(utask->uproc);
+	}
+	kfree(utask);
+}
+
 /* Runs with uproc_mutex help and uproc->rwsem write-locked. */
 static void uprobe_free_process(struct uprobe_process *uproc)
 {
@@ -354,7 +384,7 @@ static void uprobe_free_process(struct u
 		 */
 		if (utask->engine)
 			utrace_detach(utask->tsk, utask->engine);
-		kfree(utask);
+		uprobe_free_task(utask);
 	}
 	if (area->slots)
 		kfree(area->slots);
@@ -414,6 +444,7 @@ static struct uprobe_task *uprobe_add_ta
 	utask->uproc = uproc;
 	utask->active_probe = NULL;
 	utask->doomed = 0;
+	INIT_HLIST_HEAD(&utask->uretprobe_instances);
 	INIT_LIST_HEAD(&utask->list);
 	list_add_tail(&utask->list, &uproc->thread_list);
 
@@ -499,6 +530,7 @@ static struct uprobe_process *uprobe_mk_
 	uproc->n_quiescent_threads = 0;
 	INIT_HLIST_NODE(&uproc->hlist);
 	uproc->tgid = p->tgid;
+	uproc->uretprobe_trampoline_addr = NULL;
 
 	uproc->ssol_area.insn_area = NULL;
 	mutex_init(&uproc->ssol_area.setup_mutex);
@@ -696,6 +728,12 @@ int register_uprobe(struct uprobe *u)
 		uproc_is_new = 1;
 	}
 
+	if (is_uretprobe(u) && IS_ERR(uproc->uretprobe_trampoline_addr)) {
+		/* Previously failed to set up trampoline. */
+		ret = -ENOMEM;
+		goto fail_uproc;
+	}
+
 	if ((ret = uprobe_validate_vaddr(p, u->vaddr)) < 0)
 		goto fail_uproc;
 
@@ -842,6 +880,10 @@ void unregister_uprobe(struct uprobe *u)
 
 	list_del(&uk->list);
 	uprobe_free_kimg(uk);
+
+	if (is_uretprobe(u))
+		zap_uretprobe_instances(u, uproc);
+
 	if (!list_empty(&ppt->uprobe_list))
 		goto done;
 
@@ -925,12 +967,16 @@ static void uprobe_setup_ssol_vma(void)
  * area->insn_area pointing to the initialized area, or set to a
  * negative errno.
  */
-static void uprobe_init_ssol(struct uprobe_ssol_area *area)
+static void uprobe_init_ssol(struct uprobe_process *uproc)
 {
+	struct uprobe_ssol_area *area = &uproc->ssol_area;
 	struct uprobe_ssol_slot *slot;
 	int i;
 	char *slot_addr;	// Simplify pointer arithmetic
 
+	/* Trampoline setup will either fail or succeed here. */
+	uproc->uretprobe_trampoline_addr = ERR_PTR(-ENOMEM);
+
 	 /*
 	  * If we previously probed this process and then removed all
 	  * probes, the vma is still available to us.
@@ -962,6 +1008,7 @@ static void uprobe_init_ssol(struct upro
 		slot->insn = (__user uprobe_opcode_t *) slot_addr;
 		slot_addr += MAX_UINSN_BYTES;
 	}
+	uretprobe_set_trampoline(uproc);
 }
 
 /*
@@ -979,7 +1026,7 @@ static __user uprobe_opcode_t
 		mutex_lock(&ssol_setup_mutex);
 		if (likely(!area->insn_area))
 			/* Nobody snuck in and set things up ahead of us. */
-			uprobe_init_ssol(area);
+			uprobe_init_ssol(uproc);
 		mutex_unlock(&ssol_setup_mutex);
 	}
 	return area->insn_area;
@@ -987,7 +1034,11 @@ static __user uprobe_opcode_t
 
 static inline int advance_slot(int slot, struct uprobe_ssol_area *area)
 {
-	return (slot + 1) % area->nslots;
+	/* Slot 0 is reserved for uretprobe trampoline. */
+	slot++;
+	if (unlikely(slot >= area->nslots))
+		slot = 1;
+	return slot;
 }
 
 /*
@@ -1262,6 +1313,7 @@ static u32 uprobe_report_signal(struct u
 	struct uprobe_kimg *uk;
 	u32 ret;
 	unsigned long probept;
+	int hit_uretprobe_trampoline = 0;
 
 	utask = rcu_dereference((struct uprobe_task *)engine->data);
 	BUG_ON(!utask);
@@ -1274,12 +1326,17 @@ static u32 uprobe_report_signal(struct u
 	 * here because we have to do it before handling the first
 	 * probepoint hit, the probed process has to do it, and this may
 	 * be the first time our probed process runs uprobes code.
+	 *
+	 * We need the SSOL area for the uretprobe trampoline even if
+	 * this architectures doesn't single-step out of line.
 	 */
- 	uproc = utask->uproc;
+	uproc = utask->uproc;
 #ifdef CONFIG_UPROBES_SSOL
 	if (uproc->sstep_out_of_line &&
 			unlikely(IS_ERR(uprobe_verify_ssol(uproc))))
 		uproc->sstep_out_of_line = 0;
+#elif defined(CONFIG_URETPROBES)
+	(void) uprobe_verify_ssol(uproc);
 #endif
 
 	switch (utask->state) {
@@ -1287,6 +1344,14 @@ static u32 uprobe_report_signal(struct u
 		down_read(&uproc->rwsem);
 		clear_utrace_quiesce(utask);
 		probept = arch_get_probept(regs);
+
+		hit_uretprobe_trampoline = (probept == (unsigned long)
+			uproc->uretprobe_trampoline_addr);
+		if (hit_uretprobe_trampoline) {
+			uretprobe_handle_return(regs, utask);
+			goto bkpt_done;
+		}
+
 		ppt = uprobe_find_probept(uproc, probept);
 		if (!ppt) {
 			up_read(&uproc->rwsem);
@@ -1298,7 +1363,9 @@ static u32 uprobe_report_signal(struct u
 		if (likely(ppt->state == UPROBE_BP_SET)) {
 			list_for_each_entry(uk, &ppt->uprobe_list, list) {
 				struct uprobe *u = uk->uprobe;
-				if (u->handler)
+				if (is_uretprobe(u))
+					uretprobe_handle_entry(u, regs, utask);
+				else if (u->handler)
 					u->handler(u, regs);
 			}
 		}
@@ -1330,6 +1397,8 @@ static u32 uprobe_report_signal(struct u
 		else
 #endif
 			uprobe_post_ssin(utask, ppt);
+bkpt_done:
+		/* Note: Can come here after running uretprobe handlers */
 		if (unlikely(utask->doomed))
 			do_exit(SIGSEGV);
 
@@ -1344,6 +1413,13 @@ static u32 uprobe_report_signal(struct u
 		} else
 			up_read(&uproc->rwsem);
 
+		if (hit_uretprobe_trampoline)
+			/*
+			 * It's possible that the uretprobe_instance
+			 * we just recycled was the last reason for
+			 * keeping uproc around.
+			 */
+			uprobe_put_process(uproc);
 		break;
 	default:
 		goto no_interest;
@@ -1506,9 +1582,13 @@ static u32 uprobe_report_exit(struct utr
 
 	ppt = utask->active_probe;
 	if (ppt) {
-		printk(KERN_WARNING "Task died at uprobe probepoint:"
-			"  pid/tgid = %d/%d, probepoint = %#lx\n",
-			tsk->pid, tsk->tgid, ppt->vaddr);
+		if (utask->state == UPTASK_TRAMPOLINE_HIT)
+			printk(KERN_WARNING "Task died during uretprobe return:"
+				"  pid/tgid = %d/%d\n", tsk->pid, tsk->tgid);
+		else
+			printk(KERN_WARNING "Task died at uprobe probepoint:"
+				"  pid/tgid = %d/%d, probepoint = %#lx\n",
+				tsk->pid, tsk->tgid, ppt->vaddr);
 		/* Mutex cleanup depends on where we died and SSOL vs. SSIL. */
 		if (uproc->sstep_out_of_line) {
 			if (utask->state == UPTASK_SSTEP
@@ -1526,6 +1606,8 @@ static u32 uprobe_report_exit(struct utr
 			}
 		}
 		up_read(&uproc->rwsem);
+		if (utask->state == UPTASK_TRAMPOLINE_HIT)
+			uprobe_put_process(uproc);
 	}
 
 	down_write(&uproc->rwsem);
@@ -1704,3 +1786,168 @@ __initcall(init_uprobes);
 
 EXPORT_SYMBOL_GPL(register_uprobe);
 EXPORT_SYMBOL_GPL(unregister_uprobe);
+
+#ifdef CONFIG_URETPROBES
+
+/* Called when the entry-point probe u is hit. */
+static void uretprobe_handle_entry(struct uprobe *u, struct pt_regs *regs,
+	struct uprobe_task *utask)
+{
+	struct uretprobe_instance *ri;
+	unsigned long trampoline_addr;
+
+	if (IS_ERR(utask->uproc->uretprobe_trampoline_addr))
+		return;
+	trampoline_addr = (unsigned long)
+		utask->uproc->uretprobe_trampoline_addr;
+	ri = (struct uretprobe_instance *)
+		kmalloc(sizeof(struct uretprobe_instance), GFP_USER);
+	if (!ri)
+		return;
+	ri->ret_addr = arch_hijack_uret_addr(trampoline_addr, regs, utask);
+	if (likely(ri->ret_addr)) {
+		ri->rp = container_of(u, struct uretprobe, u);
+		INIT_HLIST_NODE(&ri->hlist);
+		hlist_add_head(&ri->hlist, &utask->uretprobe_instances);
+		/* We ref-count outstanding uretprobe_instances. */
+		uprobe_get_process(utask->uproc);
+	} else
+		kfree(ri);
+}
+
+/*
+ * For each uretprobe_instance pushed onto the LIFO for the function
+ * instance that's now returning, call the handler, free the ri, and
+ * decrement the uproc's ref count.  Caller ref-counts uproc, so we
+ * should never hit zero in this function.
+ *
+ * Returns the original return address.
+ *
+ * TODO: Handle longjmp out of uretprobed function.
+ */
+static unsigned long uretprobe_run_handlers(struct uprobe_task *utask,
+		struct pt_regs *regs, unsigned long trampoline_addr)
+{
+	unsigned long ret_addr;
+	struct hlist_head *head = &utask->uretprobe_instances;
+	struct uretprobe_instance *ri;
+	struct hlist_node *r1, *r2;
+
+	hlist_for_each_entry_safe(ri, r1, r2, head, hlist) {
+		if (ri->rp && ri->rp->handler)
+			ri->rp->handler(ri, regs);
+		ret_addr = ri->ret_addr;
+		hlist_del(&ri->hlist);
+		kfree(ri);
+		uprobe_put_process(utask->uproc);
+		if (ret_addr != trampoline_addr)
+			/*
+			 * This is the first ri (chronologically) pushed for
+			 * this particular instance of the probed function.
+			 */
+			return ret_addr;
+	}
+	printk(KERN_ERR "No uretprobe instance with original return address!"
+		" pid/tgid=%d/%d", utask->tsk->pid, utask->tsk->tgid);
+	utask->doomed = 1;
+	return 0;
+}
+
+/* Called when the uretprobe trampoline is hit. */
+static void uretprobe_handle_return(struct pt_regs *regs,
+	struct uprobe_task *utask)
+{
+	unsigned long orig_ret_addr;
+	/* Delay recycling of uproc until end of uprobe_report_signal() */
+	uprobe_get_process(utask->uproc);
+	utask->state = UPTASK_TRAMPOLINE_HIT;
+	utask->active_probe = &uretprobe_trampoline_dummy_probe;
+	orig_ret_addr = uretprobe_run_handlers(utask, regs,
+		(unsigned long) utask->uproc->uretprobe_trampoline_addr);
+	arch_restore_uret_addr(orig_ret_addr, regs);
+}
+
+int register_uretprobe(struct uretprobe *rp)
+{
+	if (!rp || !rp->handler)
+		return -EINVAL;
+	rp->u.handler = URETPROBE_HANDLE_ENTRY;
+	return register_uprobe(&rp->u);
+}
+
+/*
+ * The uretprobe containing u is being unregistered.  Its uretprobe_instances
+ * have to hang around 'til their associated instances return (but we can't
+ * run rp's handler).  Zap ri->rp for each one to indicate unregistration.
+ *
+ * Runs with uproc write-locked.
+ */
+static void zap_uretprobe_instances(struct uprobe *u,
+		struct uprobe_process *uproc)
+{
+	struct uprobe_task *utask;
+	struct uretprobe *rp = container_of(u, struct uretprobe, u);
+
+	if (!uproc)
+		return;
+
+	list_for_each_entry(utask, &uproc->thread_list, list) {
+		struct hlist_node *r;
+		struct uretprobe_instance *ri;
+
+		hlist_for_each_entry(ri, r, &utask->uretprobe_instances, hlist)
+			if (ri->rp == rp)
+				ri->rp = NULL;
+	}
+}
+
+void unregister_uretprobe(struct uretprobe *rp)
+{
+	if (!rp)
+		return;
+	unregister_uprobe(&rp->u);
+}
+
+/*
+ * uproc->ssol_area has been successfully set up.  Establish the
+ * uretprobe trampoline in slot 0.
+ */
+static void uretprobe_set_trampoline(struct uprobe_process *uproc)
+{
+	uprobe_opcode_t bp_insn = BREAKPOINT_INSTRUCTION;
+	struct uprobe_ssol_area *area = &uproc->ssol_area;
+	struct uprobe_ssol_slot *slot = &area->slots[0];
+
+	if (access_process_vm(current, (unsigned long) slot->insn,
+			&bp_insn, BP_INSN_SIZE, 1) == BP_INSN_SIZE) {
+		uproc->uretprobe_trampoline_addr = slot->insn;
+		slot->state = SSOL_RESERVED;
+		area->next_slot = 1;
+	} else {
+		printk(KERN_ERR "uretprobes disabled for pid %d:"
+			" cannot set uretprobe trampoline at %p\n",
+			uproc->tgid, slot->insn);
+	}
+}
+
+EXPORT_SYMBOL_GPL(register_uretprobe);
+EXPORT_SYMBOL_GPL(unregister_uretprobe);
+
+#else	/* ! CONFIG_URETPROBES */
+
+static void uretprobe_handle_entry(struct uprobe *u, struct pt_regs *regs,
+	struct uprobe_task *utask)
+{
+}
+static void uretprobe_handle_return(struct pt_regs *regs,
+	struct uprobe_task *utask)
+{
+}
+static void uretprobe_set_trampoline(struct uprobe_process *uproc)
+{
+}
+static void zap_uretprobe_instances(struct uprobe *u,
+	struct uprobe_process *uproc)
+{
+}
+#endif /* CONFIG_URETPROBES */
_

Follow-Ups:
- Re: updated uprobes patches
  - From: Frank Eigler
- Re: updated uprobes patches
  - From: Roland McGrath

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]