This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [perfmon] Re: [PATCH 9/16] 2.6.17-rc6 perfmon2 patch for review: kernel-level API support (kapi)


Stephane Eranian wrote:
Christoph,

On Fri, Jun 16, 2006 at 04:45:19PM +0100, Christoph Hellwig wrote:

On Fri, Jun 16, 2006 at 11:41:32AM -0400, Frank Ch. Eigler wrote:

Whether one uses systemtap, raw kprobes, or some specialized
tracing/stats-collecting patch surely forthcoming, kernel-level APIs
would be needed to perform fine-grained kernel-scope measurements
using these counters.

You do not need to be in the kernel to measure kernel level
execution. Monitoring is statistical by nature, this is not about capturing
execution traces. All PMU models have the capability to filter on privilege
levels so you can distinguish user from kernel.

To measure certain functions of the kernel, some PMU models provide a
way to restrict monitoring to a range of contiguous code addresses, e.g.
Itanium 2.

The filtering on privilege level is too coarse. For example want to start event counting on entry into a kernel function and stop when exiting the function. The itanium hw is not ideal for this application. The children functions may not be contiguous with the starting function. Other kinds of predication based on state information, e.g. particular process or thread could be very useful.


The case of systemtap is different. I think they would like to start/stop
monitoring on certain systemtap events, e.g., a function is called, a
threshold is met. Start and stop would be triggered from a systemtap
callback which is implemented by a kernel module, if I understand
the architecture. In the scenario, the monitoring session would have
to be created and controlled from the kernel. One could envision an
architecture, where monitoring would be controlled from user level with systemtap making upcalls but I do not think this is possible given
that the instrumentation points can be very low level.


Another usage for a kernel-level monitoring API that I know about is people who want to explore how to use the performance monitoring
(and profiles) to guide the scheduler. A thread profile can tell the cache
hit rates, stalls, bus bandwidth utilization, whether it uses flops and so on.
This could be useful to to find the best placement for threads and avoid co-scheduling
threads that trash each other's micro-architectural state or saturate the memory bus.
In this scenario, one could envision a kernel thread controlling monitoring
and processing profiles for the scheduler. But, to concur with you Christoph,
I think this could be achieved from user level and the valuable information
may be passed to the scheduler via a specific system call for instance.

One probably could configure the performance monitoring hardware from userspace. However, for micro-measurement in the kernel it seems like the pmu reads in kernel space would still be required.


-Will


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]