This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: proposed instruction trace support in SystemTap

From: Maynard Johnson <maynardj at us dot ibm dot com>
To: "Frank Ch. Eigler" <fche at redhat dot com>
Cc: dcnltc at us dot ibm dot com, systemtap at sourceware dot org
Date: Fri, 06 Jul 2007 16:43:04 -0500
Subject: Re: proposed instruction trace support in SystemTap
References: <4689826A.9040902@us.ibm.com> <y0mabuar6ap.fsf@ton.toronto.redhat.com>
Reply-to: maynardj at us dot ibm dot com

Frank, I work with Dave, and I told him I would cover for him on this issue while he's away from the office for a bit. I'm very familiar with the existing ITrace tool (from the OSS "Performance Inspector" project) that our team contributes to, but not so much when it comes to SystemTap. But I'll give my two cents worth . . .

Frank Ch. Eigler wrote:

Dave Nomura <dcnltc@us.ibm.com> writes:

[...]

Thanks for continuing with this idea.

SINGLE_STEP/BRANCH TRAP HANDLER
[...]
probe branch_step label("branch handler 1")
{
       <do whatever you want for each branch instruction>
       itrace_output();        // write itrace binary output
}

where "label" is an language extension to attach a name [...]

Particularly, to turn the probe on and off by explicit function calls.
This is an area we discussed at the face-to-face meeting in Ottawa
last week, in relation to user-space probes.  The same concept could
apply to other probe types.

Regarding semantics, this is tricky business.  Turning off active
probes is relatively simple, because even if the underlying probe API
doesn't support instantaneous (atomic) disarming, we can simulate it
until the API catches up (by adding an "am I supposed to be disarmed?"
conditional to the handler).  Turning them *on* is different - we
can't help but possibly miss a couple of events as the API catches up.

Maybe this is acceptable, maybe not.  Some syntax may help tell us the
judgement of the script programmer.

I don't believe users would even be aware of this gap.

Regarding syntax, we have more options than an opaque string and
explicit function calls to turn things on and off.  We could have a
guard expression like dtrace's /.../ - though we would probably just
spell it thusly:

probe PROBEPOINT if (expr) { }

where expr could be something as simple as (probe_1_enabled_p), which
better be a global variable.

Yes, I think this construct could be very useful.


The compiler would analyze expr for dataflow, arrange to evaluate this
condition whenever appropriate (after another probe writes any of its
inputs), and arrange to promptly activate or deactivate the
appropriate probes.  Since "promptly" may take some time, script
programmers plopping a conditional like this in are implying consent
to a few events being missed.

The itrace_output() is a function that produces the raw trace data
that could then be post processed for consumption by various
performance analysis tools but the user could do something as simple
as printing out the PC value.

Is the "raw trace data" a well-defined thing?  Why would this sort of
hard-coded data set be desirable, as opposed to letting the programmer
write something explicit like:
   printf("%2b%8b%4b", cpu(), get_ticks(), pc())
(Of course this can be hidden in a function of his own, or in an
inspectable tapset.)

In fact, the raw trace data is well-defined by the existing ITrace tool I mentioned above. Of course, this definition is negotiable. The idea behind this is to provide enough information in the raw trace data so that, for example, a tool can analyze this data and help the performance analyst identify the causes of pipeline stalls.

It might be nice if there was some way to name the relay streams so
that they aren't intermingled.  Maybe something analogous to the
stream parameter to fprintf.

Something similar was mentioned as desirable in the OLS2007 talk by
Bligh / Desnoyer on google's ktrace & lttng.  There, the context was
an occasional need to have separate buffers for high-volume and
low-volume messages, so that buffer overflows did not penalize the
smaller messages too much.  Let's think about this some more.

Certainly this could be a benefit, although not a necessity for a first pass implementation.

The SystemTap translator would generate calls to target dependent
code to implement single instruction or branch trapping.  This is
done a variety of ways on different architectures, but generally
involves setting a bit in a system register to enable single
instruction/branch trapping.

Is this sort of thing done/doable in kernel space also, or just on user-space threads?

The existing tool is capable of single-step tracing the kernel, with some exceptions.

Is there an existing kernel API for management of
these registers/fields?

Unfortunately, not that I'm aware of.

[...] - instruction tracing enabled for a parent process id will
enable tracing for all of its children (threads). [...]

This is a sensible behavior, though so is a per-thread alternative.
Since the tracing flags are per-thread control registers anyway,
I suspect we'll have to build the former on top of the latter.

[...] INITIALIZATION/CLEANUP
Initialization/cleanup of the instruction tracing feature could be
done by insertion of a call to an itrace initialilzation/cleanup
routine in the user's begin/end probes.

probe begin
       itrace_init(<some params>)
probe end
       itrace_cleanup()

Neither of these should be necessary.  The existence of
instruction-trace type probes should imply automated setup/cleanup.

- FChE

Thanks very much for your comments.

-Maynard

Follow-Ups:
- Re: proposed instruction trace support in SystemTap
  - From: Frank Ch. Eigler

References:
- proposed instruction trace support in SystemTap
  - From: Dave Nomura
- Re: proposed instruction trace support in SystemTap
  - From: Frank Ch. Eigler

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]