This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: resuming after stop at syscall_entry

From: Roland McGrath <roland at redhat dot com>
To: David Smith <dsmith at redhat dot com>
Cc: utrace-devel at redhat dot com, Systemtap List <systemtap at sources dot redhat dot com>
Date: Fri, 24 Apr 2009 17:58:37 -0700 (PDT)
Subject: Re: resuming after stop at syscall_entry
References: <20090418042722.5B584FC35F@magilla.sf.frob.com> <49EF81AA.3060306@redhat.com>

> This processing makes sense I think.  It is a bit complicated of course,
> but not unnecessarily so.

Glad to hear it!

> > A tracing-only engine that just wants to see the syscall that is going
> > to be done can just do:
> > 
> > 	if (utrace_resume_action(action) == UTRACE_STOP)
> > 		return UTRACE_REPORT;
> > 
> > at the top of report_syscall_entry, so it just doesn't think about it
> > until it thinks the call will go now through.  
> 
> Systemtap currently doesn't support changing syscall arguments, if it
> does, obviously a few things would need to change.
> 
> But, I think systemtap would probably fall here - only see the syscall
> that is actually going to be done.  So systemtap could possibly get
> multiple callbacks for the same syscall, but only pay attention to the
> last one, correct?

Correct.  The advice quoted above is what its callbacks would do to ignore
the callbacks before the last one.

Note that you'll only be sure you're seeing "actually going to be done"
state if yours is the "first" engine attached.  (Thus, by the new special
case calling order, its will be the last report_syscall_entry callback to
run.)  This is just the general "engine priority" thing, not anything new.

In cases like ptrace and kmview (Renzo's thing), even if these engines are
first (i.e. called after yours), you will still be seeing the "final" state
because they did their changes asynchronously before resuming.  But some
other engine might do its changes directly in its own callback instead
(whether it used UTRACE_STOP and got a repeat callback, or just on the
first time through without stopping), so those changes would happen only
after your "last" callback.

In the same vein, "earlier" engines (i.e. here called after yours) might
use UTRACE_STOP after your first callback had every reason to believe it
was the "last" one (i.e. that if did not hit).  In that case, you will get
a repeat call (with UTRACE_SYSCALL_RESUMED flag).  On that call, you need
to cope with the fact that you already did your entry tracing work before
(but now things may have changed).  

If the theory is that you want to respect your place in the engine order,
whatever that is (i.e., if your tracing just reported a lie, it was the lie
you were supposed to believe), then "coping" just means ignoring the
repeat.  (This is no different in kind from an "earlier" engine/later
callback changing the registers after your callback and never stopping.)

For that you need to keep track of whether you already handled it or not.
(Depending on your relative order and the actions of the other engines, you
might get either UTRACE_STOP or UTRACE_SYSCALL_RESUMED either before or
after "you handled it".  So you can't use those alone.)  You can do this in
two ways.  One is to use your own per-thread state (engine->data, etc.).
The other is to disable the SYSCALL_ENTRY event when you've handled it, so
you won't get more callbacks.  Then you can re-enable the event in your
report_syscall_exit callback (or report_quiesce/report_signal, or whatever
is most convenient to be sure you'll run before it goes back to user mode).
i.e., use utrace_set_events() from the callbacks.

> This is understandable, but does hurt my head a *little* bit.  I think
> if you put the above full text somewhere and provided some examples this
> would make sense to people.

The utrace-syscall-resumed branch puts this in the kerneldoc text for
struct utrace_engine_ops (where callback return values and common arguments
are described):

  * When %UTRACE_STOP is used in @report_syscall_entry, then @task
+ * stops before attempting the system call.  In this case, another
+ * @report_syscall_entry callback follows after @task resumes; in a
+ * second or later callback, %UTRACE_SYSCALL_RESUMED is set in the
+ * @action argument to indicate a repeat callback still waiting to
+ * attempt the same system call invocation.  This repeat callback
+ * gives each engine an opportunity to reexamine registers another
+ * engine might have changed while @task was held in %UTRACE_STOP.
+ *
+ * In other cases, the resume action does not take effect until @task
+ * is ready to check for signals and return to user mode.  If there
+ * are more callbacks to be made, the last round of calls determines
+ * the final action.  A @report_quiesce callback with @event zero, or
+ * a @report_signal callback, will always be the last one made before
+ * @task resumes.  Only %UTRACE_STOP is "sticky"--if @engine returned
+ * %UTRACE_STOP then @task stays stopped unless @engine returns
+ * different from a following callback.

I don't know where the longer explanation and/or examples belong.
Perhaps in a new section in utrace.tmpl?  We could start with putting
together some text on the wiki.  Another idea is to add a few example
modules in samples/utrace/.  Those can illustrate things with good
comments, and also could be built verbatim to load multiple
ones/instances in different orders and demonstrate what happens, etc.

It would be nice to have folks like you and Renzo work up this text
and/or examples.  What's needed is stuff that makes sense to you guys
as users of the API, rather than what makes sense to me who has
thought too much already about all this stuff.


Thanks,
Roland

Follow-Ups:
- Re: resuming after stop at syscall_entry
  - From: David Smith

References:
- Re: resuming after stop at syscall_entry
  - From: David Smith

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]