This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Proposed systemtap access to perfmon hardware


Frank Ch. Eigler wrote:
wcohen wrote:


To try to get a feel on how the performance monitoring hardware
support would work in SystemTap I wrote some simple examples.


Nice work.  To flesh out the operational model (and please correct me
if I'm wrong): the way this stuff would all work is:

- The systemtap translator would be linked with libpfm from perfmon2.
  (libpfm license is friendly.)

The libpfm library license is an MIT license, so it should be compatible with the systemtap licensing.


- This library would be used at translation time to map perfmon.* probe
  point specifications to PMC register descriptions (pfmlib_output_param_t).
  (This will require telling the system the exact target cpu type for
  cross-instrumentation.)

Yes, this complicates the cross kernel (build instrumentation on one system and run instrument on another). Different processors architectures could be used on each. Some performance monitoring systems such as PAPI has mappings for some generic names. This might help in some cases. However, there are some differences in computer architecture that just do not translate to the generic models


- These descriptions would be emitted into the C code, for actual
  installation during module initialization.  For our first cut, since
  there appears to exist no kernel-side management API at the moment,
  the C code would directly manipulate the PMC registers.  (This means
  no coexistence for oprofile or other concurrent perfctr probing.
  C'est la vie.)

Would prefer to reuse to other software to access the performance monitoring hardware. Don't want to generate yet another different piece of software that uses the performance monitoring hardware. We want 64-bit values, but a number of the counters are much smaller than that (32-bit). On the pentium 4 the access to the performance counters is complicated and would prefer not reinventing the code to access the performance counters. This mechanism will only work with the global setup like sampling per thread would be unsupported. Also need to translate between the name and the event number the table in OProfile and perfmon are getting pretty large to keep all that information and catch any inabilities to map events to a register.


One advantage of generating the C code would be that it would work with existing RHEL4 kernel.

- The "sample" type perfmon probes would map to the same kind of
  dispatch/callback as the current "timer.profile": the probe handler
  should have valid pt_regs available.

Yes, the pt_regs will be available to the sample type probe.


- The free-running type perfmon probes, probably named
  "perfctr.SPEC.setup" or ".start" or ".begin" would map to a one-time
  initialization that passes a token (PMC counter number?)  to the
  handler.  Other probe handlers can then query/manipulate the
  free-running counter using that number via the start/stop/query
  functions.
>
Is that sufficiently detailed to begin an implementation?

Pretty close. The one thing that isn't answered is the division of the labor for the sampling probes, onetime setup vs sample handler. Want to have some handle set in a global variable for the probe, but do not want to execute that everytime that the sample is collected. For the free-running probes it is pretty clear to handle the samples.


[...] print ("ipc is %d.%d \n", ipc/factor, ipc % factor);


(An aside: we should have a more compact notation for this.  We won't
support floating point numbers, but integers can be commonly scaled
like this.  Maybe printf("%.Nf", value), where N implies a
power-of-ten scaling factor, and printf("%*f", value, scale) for
general factors.)

Yes, some scaling mechanism would be nice in some cases. The chances of having IPC around the value of one were pretty likely, so I put in the scaling to give a better picture of what is going on.


-Will


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]