This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Initial stap support for inode-based uprobes


The attached patch implements initial support for SystemTap to use
Srikar's inode-based uprobes.  It is also published in the branch
jistone/inode-uprobes, in gitweb here:
<http://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=shortlog;h=refs/heads/jistone/inode-uprobes>

The uprobes branch I worked from is here:
<http://git.kernel.org/?p=linux/kernel/git/srikar/linux-uprobes.git;a=shortlog;h=refs/heads/tip_inode_uprobes_010411>

The good news is that the basics appear to be working well.  I've tested
probing stap itself and libdw, and got the expected probe hits.  I'd
appreciate any review of my implementation so far.  Beyond these working
basics, there are a lot of details to hammer out, so here's the list of
what I know.

* EXPORT_SYMBOL_GPL, or uprobes' lack thereof.  Without kernel exports,
the whole API will be inaccessible to us.

* Return probes.  This hasn't yet been added to the new uprobes.

* Process filtering.  AFAICS, the current uprobes implementation sets
the breakpoint in all processes that map the particular inode.  There is
a filtering mechanism, but that seems only to decide whether to call the
handler each time.  You'll still take the bp/sstep overhead.  Also, on
stap's side, we previously had the ability to limit process probes to
the -x/-c target and children, which I haven't tried here yet.

* Runtime build-id verification.  Right now I'm just mapping the path to
inode*, without checking that the build-id is what we expected.  I'm not
sure we even could at the systemtap-init point.  Even if we did, the
file may still get modified without changing the inode, and I don't
think this uprobes gives us any way to notice or decide whether we like
the new form.

* SDT semaphore.  In the current form, we have no hook on individual
processes, so we can't modify the semaphores in applications that are
actively gating their markers.  We'll probably need something like
PR10994 to achieve this, which isn't really about uprobes per-se, but
rather about living without utrace.

* Argument access.  If you try $args, it will fail with a missing symbol
'task_user_regset_view'.  I haven't looked closely at this yet.

* Probe IP.  For many probe handlers, we try to set the pt_regs IP to
the actual breakpoint IP, but in this case we don't happen to even know
the virtualized address.  Uprobes itself uses uprobes_get_bkpt_addr() in
some instances, but that's not exposed for our use.


I think that's it.  So if you happen to build a kernel with the new
uprobes, please enjoy systemtap support too. :)

Josh
commit f8e4aa0bc62d79bfc7a2dcb1508215a675d9f83d
Author: Josh Stone <jistone@redhat.com>
Date:   Thu May 19 19:44:16 2011 -0700

    Add initial support for inode-based uprobes
    
    This adds support for placing regular userspace probes using the new
    inode+offset API being developed for the upstream kernel.  This includes
    probing functions, statements, and SDT markers, but return probes aren't
    yet supported in the new API.  A lot of the finer details of systemtap's
    userspace runtime still needs work too, but this is a functional start.
    
    * runtime/uprobes-inode.c: New, basic registration code to lookup
      filename inodes and connect uprobes using the new API.
    * tapsets.cxx (kernel_supports_inode_uprobes): New, guess whether this
      is an inode-uprobes kernel based on CONFIG values.
      (dwarf_builder::build): Disallow userspace return probes.
      (uprobe_derived_probe::join_group): Only trigger task_finder and the
      manual uprobes model for the old style of uprobes.
      (uprobe_builder::build): Disallow absolute-address userspace probes.
      (uprobe_derived_probe_group::emit*): Split into inode/utrace variants.

diff --git a/runtime/uprobes-inode.c b/runtime/uprobes-inode.c
new file mode 100644
index 0000000..b04ca6d
--- /dev/null
+++ b/runtime/uprobes-inode.c
@@ -0,0 +1,119 @@
+/* -*- linux-c -*-
+ * Common functions for using inode-based uprobes
+ * Copyright (C) 2011 Red Hat Inc.
+ *
+ * This file is part of systemtap, and is free software.  You can
+ * redistribute it and/or modify it under the terms of the GNU General
+ * Public License (GPL); either version 2, or (at your option) any
+ * later version.
+ */
+
+#ifndef _UPROBES_INODE_C_
+#define _UPROBES_INODE_C_
+
+#include <linux/fs.h>
+#include <linux/namei.h>
+#include <linux/uprobes.h>
+
+struct stp_inode_uprobe_target {
+	const char * const filename;
+	struct inode *inode;
+};
+
+struct stp_inode_uprobe_consumer {
+	struct uprobe_consumer consumer;
+	struct stp_inode_uprobe_target * const target;
+	loff_t offset;
+	/* XXX sdt_sem_offset support? */
+
+	struct stap_probe * const probe;
+};
+
+
+static void
+stp_inode_uprobes_put(struct stp_inode_uprobe_target *targets,
+		      size_t ntargets)
+{
+	size_t i;
+	for (i = 0; i < ntargets; ++i) {
+		struct stp_inode_uprobe_target *ut = &targets[i];
+		iput(ut->inode);
+		ut->inode = NULL;
+	}
+}
+
+static int
+stp_inode_uprobes_get(struct stp_inode_uprobe_target *targets,
+		      size_t ntargets)
+{
+	int ret = 0;
+	size_t i;
+	for (i = 0; i < ntargets; ++i) {
+		struct path path;
+		struct stp_inode_uprobe_target *ut = &targets[i];
+		ret = kern_path(ut->filename, LOOKUP_FOLLOW, &path);
+		if (!ret) {
+			ut->inode = igrab(path.dentry->d_inode);
+			if (!ut->inode)
+				ret = -EINVAL;
+		}
+		if (ret)
+			break;
+	}
+	if (ret)
+		stp_inode_uprobes_put(targets, i);
+	return ret;
+}
+
+static void
+stp_inode_uprobes_unreg(struct stp_inode_uprobe_consumer *consumers,
+			size_t nconsumers)
+{
+	size_t i;
+	for (i = 0; i < nconsumers; ++i) {
+		struct stp_inode_uprobe_consumer *uc = &consumers[i];
+		unregister_uprobe(uc->target->inode, uc->offset,
+				  &uc->consumer);
+	}
+}
+
+static int
+stp_inode_uprobes_reg(struct stp_inode_uprobe_consumer *consumers,
+		      size_t nconsumers)
+{
+	int ret = 0;
+	size_t i;
+	for (i = 0; i < nconsumers; ++i) {
+		struct stp_inode_uprobe_consumer *uc = &consumers[i];
+		ret = register_uprobe(uc->target->inode, uc->offset,
+				      &uc->consumer);
+		if (ret)
+			break;
+	}
+	if (ret)
+		stp_inode_uprobes_unreg(consumers, i);
+	return ret;
+}
+
+static int
+stp_inode_uprobes_init(struct stp_inode_uprobe_target *targets, size_t ntargets,
+		       struct stp_inode_uprobe_consumer *consumers, size_t nconsumers)
+{
+	int ret = stp_inode_uprobes_get(targets, ntargets);
+	if (!ret) {
+		ret = stp_inode_uprobes_reg(consumers, nconsumers);
+		if (ret)
+			stp_inode_uprobes_put(targets, ntargets);
+	}
+	return ret;
+}
+
+static void
+stp_inode_uprobes_exit(struct stp_inode_uprobe_target *targets, size_t ntargets,
+		       struct stp_inode_uprobe_consumer *consumers, size_t nconsumers)
+{
+	stp_inode_uprobes_unreg(consumers, nconsumers);
+	stp_inode_uprobes_put(targets, ntargets);
+}
+
+#endif /* _UPROBES_INODE_C_ */
diff --git a/tapsets.cxx b/tapsets.cxx
index 8afe02e..25170dc 100644
--- a/tapsets.cxx
+++ b/tapsets.cxx
@@ -3795,6 +3795,16 @@ dwarf_derived_probe::join_group (systemtap_session& s)
 }
 
 
+static bool
+kernel_supports_inode_uprobes(systemtap_session& s)
+{
+  // The arch-supports is new to the builtin inode-uprobes, so it makes a
+  // reasonable indicator of the new API.  Else we'll need an autoconf...
+  return (s.kernel_config["CONFIG_ARCH_SUPPORTS_UPROBES"] == "y"
+          && s.kernel_config["CONFIG_UPROBES"] == "y");
+}
+
+
 dwarf_derived_probe::dwarf_derived_probe(const string& funcname,
                                          const string& filename,
                                          int line,
@@ -3835,6 +3845,12 @@ dwarf_derived_probe::dwarf_derived_probe(const string& funcname,
       // ET_DYN ones do (addr += run-time mmap base address).  We tell these apart
       // by the incoming section value (".absolute" vs. ".dynamic").
       // XXX Assert invariants here too?
+
+      // inode-uprobes needs an offset rather than an absolute VM address.
+      if (kernel_supports_inode_uprobes(q.dw.sess) &&
+          section == ".absolute" && addr == dwfl_addr &&
+          addr >= q.dw.module_start && addr < q.dw.module_end)
+        this->addr = addr - q.dw.module_start;
     }
   else
     {
@@ -6182,7 +6198,13 @@ dwarf_builder::build(systemtap_session & sess,
       else
 	module_name = user_path; // canonicalize it
 
-      if (sess.kernel_config["CONFIG_UTRACE"] != string("y"))
+      if (kernel_supports_inode_uprobes(sess))
+        {
+          if (has_null_param(parameters, TOK_RETURN))
+            throw semantic_error
+              (_("process return probes not available with inode-based uprobes"));
+        }
+      else if (sess.kernel_config["CONFIG_UTRACE"] != string("y"))
         throw semantic_error (_("process probes not available without kernel CONFIG_UTRACE"));
 
       // user-space target; we use one dwflpp instance per module name
@@ -6635,6 +6657,16 @@ private:
     return p->module + "|" + p->section + "|" + lex_cast(p->pid);
   }
 
+  // Using our own utrace-based uprobes
+  void emit_module_utrace_decls (systemtap_session& s);
+  void emit_module_utrace_init (systemtap_session& s);
+  void emit_module_utrace_exit (systemtap_session& s);
+
+  // Using the upstream inode-based uprobes
+  void emit_module_inode_decls (systemtap_session& s);
+  void emit_module_inode_init (systemtap_session& s);
+  void emit_module_inode_exit (systemtap_session& s);
+
 public:
   void emit_module_decls (systemtap_session& s);
   void emit_module_init (systemtap_session& s);
@@ -6648,11 +6680,15 @@ uprobe_derived_probe::join_group (systemtap_session& s)
   if (! s.uprobe_derived_probes)
     s.uprobe_derived_probes = new uprobe_derived_probe_group ();
   s.uprobe_derived_probes->enroll (this);
-  enable_task_finder(s);
 
-  // Ask buildrun.cxx to build extra module if needed, and
-  // signal staprun to load that module
-  s.need_uprobes = true;
+  if (!kernel_supports_inode_uprobes(s))
+    {
+      enable_task_finder(s);
+
+      // Ask buildrun.cxx to build extra module if needed, and
+      // signal staprun to load that module
+      s.need_uprobes = true;
+    }
 }
 
 
@@ -6684,7 +6720,7 @@ uprobe_derived_probe::emit_unprivileged_assertion (translator_output* o)
 struct uprobe_builder: public derived_probe_builder
 {
   uprobe_builder() {}
-  virtual void build(systemtap_session &,
+  virtual void build(systemtap_session & sess,
 		     probe * base,
 		     probe_point * location,
 		     literal_map_t const & parameters,
@@ -6692,6 +6728,9 @@ struct uprobe_builder: public derived_probe_builder
   {
     int64_t process, address;
 
+    if (kernel_supports_inode_uprobes(sess))
+      throw semantic_error (_("absolute process probes not available with inode-based uprobes"));
+
     bool b1 = get_param (parameters, TOK_PROCESS, process);
     (void) b1;
     bool b2 = get_param (parameters, TOK_STATEMENT, address);
@@ -6705,10 +6744,10 @@ struct uprobe_builder: public derived_probe_builder
 
 
 void
-uprobe_derived_probe_group::emit_module_decls (systemtap_session& s)
+uprobe_derived_probe_group::emit_module_utrace_decls (systemtap_session& s)
 {
   if (probes.empty()) return;
-  s.op->newline() << "/* ---- user probes ---- */";
+  s.op->newline() << "/* ---- utrace uprobes ---- */";
   // If uprobes isn't in the kernel, pull it in from the runtime.
 
   s.op->newline() << "#if defined(CONFIG_UPROBES) || defined(CONFIG_UPROBES_MODULE)";
@@ -6892,11 +6931,11 @@ uprobe_derived_probe_group::emit_module_decls (systemtap_session& s)
 
 
 void
-uprobe_derived_probe_group::emit_module_init (systemtap_session& s)
+uprobe_derived_probe_group::emit_module_utrace_init (systemtap_session& s)
 {
   if (probes.empty()) return;
 
-  s.op->newline() << "/* ---- user probes ---- */";
+  s.op->newline() << "/* ---- utrace uprobes ---- */";
 
   s.op->newline() << "for (j=0; j<MAXUPROBES; j++) {";
   s.op->newline(1) << "struct stap_uprobe *sup = & stap_uprobes[j];";
@@ -6925,10 +6964,10 @@ uprobe_derived_probe_group::emit_module_init (systemtap_session& s)
 
 
 void
-uprobe_derived_probe_group::emit_module_exit (systemtap_session& s)
+uprobe_derived_probe_group::emit_module_utrace_exit (systemtap_session& s)
 {
   if (probes.empty()) return;
-  s.op->newline() << "/* ---- user probes ---- */";
+  s.op->newline() << "/* ---- utrace uprobes ---- */";
 
   // NB: there is no stap_unregister_task_finder_target call;
   // important stuff like utrace cleanups are done by
@@ -6998,6 +7037,126 @@ uprobe_derived_probe_group::emit_module_exit (systemtap_session& s)
   s.op->newline() << "mutex_destroy (& stap_uprobes_lock);";
 }
 
+
+void
+uprobe_derived_probe_group::emit_module_inode_decls (systemtap_session& s)
+{
+  if (probes.empty()) return;
+  s.op->newline() << "/* ---- inode uprobes ---- */";
+  s.op->newline() << "#include \"uprobes-inode.c\"";
+
+  // Write the probe handler.
+  s.op->newline() << "static int enter_inode_uprobe "
+                  << "(struct uprobe_consumer *inst, struct pt_regs *regs) {";
+  s.op->newline(1) << "struct stp_inode_uprobe_consumer *sup = "
+                   << "container_of(inst, struct stp_inode_uprobe_consumer, consumer);";
+  common_probe_entryfn_prologue (s.op, "STAP_SESSION_RUNNING", "sup->probe");
+  s.op->newline() << "c->regs = regs;";
+  s.op->newline() << "c->regflags |= _STP_REGS_USER_FLAG;";
+  // XXX: Can't set SET_REG_IP; we don't actually know the relocated address.
+  // ...  In some error cases, uprobes itself calls uprobes_get_bkpt_addr().
+  s.op->newline() << "(*sup->probe->ph) (c);";
+  common_probe_entryfn_epilogue (s.op);
+  s.op->newline() << "return 0;";
+  s.op->newline(-1) << "}";
+  s.op->assert_0_indent();
+
+  // Index of all the modules for which we need inodes.
+  map<string, unsigned> module_index;
+  unsigned module_index_ctr = 0;
+
+  // Discover and declare targets for each unique path.
+  s.op->newline() << "static struct stp_inode_uprobe_target "
+                  << "stap_inode_uprobe_targets[] = {";
+  s.op->indent(1);
+  for (unsigned i=0; i<probes.size(); i++)
+    {
+      uprobe_derived_probe *p = probes[i];
+      if (module_index.find (p->module) == module_index.end())
+        {
+          module_index[p->module] = module_index_ctr++;
+          s.op->newline() << "{ .filename=" << lex_cast_qstring(p->module) << " },";
+        }
+    }
+  s.op->newline(-1) << "};";
+  s.op->assert_0_indent();
+
+  // Declare the actual probes.
+  s.op->newline() << "static struct stp_inode_uprobe_consumer "
+                  << "stap_inode_uprobe_consumers[] = {";
+  s.op->indent(1);
+  for (unsigned i=0; i<probes.size(); i++)
+    {
+      uprobe_derived_probe *p = probes[i];
+      unsigned index = module_index[p->module];
+      s.op->newline() << "{"
+                      << " .consumer={ .handler=enter_inode_uprobe },"
+                      << " .target=&stap_inode_uprobe_targets[" << index << "],"
+                      << " .offset=(loff_t)0x" << hex << p->addr << dec << "ULL,"
+                      << " .probe=" << common_probe_init (p) << ","
+                      << "},";
+    }
+  s.op->newline(-1) << "};";
+  s.op->assert_0_indent();
+}
+
+
+void
+uprobe_derived_probe_group::emit_module_inode_init (systemtap_session& s)
+{
+  if (probes.empty()) return;
+  s.op->newline() << "/* ---- inode uprobes ---- */";
+  s.op->newline() << "rc = stp_inode_uprobes_init ("
+                  << "stap_inode_uprobe_targets, "
+                  << "ARRAY_SIZE(stap_inode_uprobe_targets), "
+                  << "stap_inode_uprobe_consumers, "
+                  << "ARRAY_SIZE(stap_inode_uprobe_consumers));";
+}
+
+
+void
+uprobe_derived_probe_group::emit_module_inode_exit (systemtap_session& s)
+{
+  if (probes.empty()) return;
+  s.op->newline() << "/* ---- inode uprobes ---- */";
+  s.op->newline() << "stp_inode_uprobes_exit ("
+                  << "stap_inode_uprobe_targets, "
+                  << "ARRAY_SIZE(stap_inode_uprobe_targets), "
+                  << "stap_inode_uprobe_consumers, "
+                  << "ARRAY_SIZE(stap_inode_uprobe_consumers));";
+}
+
+
+void
+uprobe_derived_probe_group::emit_module_decls (systemtap_session& s)
+{
+  if (kernel_supports_inode_uprobes (s))
+    emit_module_inode_decls (s);
+  else
+    emit_module_utrace_decls (s);
+}
+
+
+void
+uprobe_derived_probe_group::emit_module_init (systemtap_session& s)
+{
+  if (kernel_supports_inode_uprobes (s))
+    emit_module_inode_init (s);
+  else
+    emit_module_utrace_init (s);
+}
+
+
+void
+uprobe_derived_probe_group::emit_module_exit (systemtap_session& s)
+{
+  if (kernel_supports_inode_uprobes (s))
+    emit_module_inode_exit (s);
+  else
+    emit_module_utrace_exit (s);
+}
+
+
 // ------------------------------------------------------------------------
 // Kprobe derived probes
 // ------------------------------------------------------------------------

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]