This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [pcp] suitability of PCP for event tracing

From: nathans at aconex dot com
To: "Frank Ch. Eigler" <fche at redhat dot com>
Cc: systemtap at sources dot redhat dot com, pcp at oss dot sgi dot com
Date: Sat, 28 Aug 2010 14:23:50 +1000 (EST)
Subject: Re: [pcp] suitability of PCP for event tracing

Hi Frank & systemtap folks,

(long time systemtap user here, love your work - thanks!)

----- "Frank Ch. Eigler" <fche@redhat.com> wrote:

> 
> We're investigating to what extent the PCP suite may be suitable for
> more general low-level event tracing.  Just from docs / source gazing
> (so please excuse my terminology errors), a few challenges would seem
> to be:
> 
> * poll-based data gathering
> 
>   It seems as though PMDAs are used exclusively in 'polling' mode,
>   meaning that underlying system statistics are periodically queried
>   and summary results stored.  In our context, it would be useful if
>   PMDAs could push event data into the stream as they occur - perhaps
>   hundreds of times a second.

Yes, this is a bit of a square-peg-round-hole situation.  Would be worth
backing up a bit and trying to understand what the aim is here -- is
there particular functionality you're after?  If you're interested only
in storing trace data historically, we'd probably go down a different
route to if you want both live and historical (much more tricky).

Having said that, I can imagine protocol extensions to support a live
push mechanism though - it would be a fascinating research project and
could provide some fairly unique capabilities.

We would need an additional pmcd/client exchange to register initial
interest in receiving tracing information, which would have to be able
to identify from which pmda that information will originate.  We'd also
need an extension to the pmcd/pmda protocol to allow these out-of-band
pmda-driven events to be pushed to pmcd for it to multiplex out to any/
all interested client tools.

Mark, Ken or one of the other PCP guys might be able to envisage a way
to overlay this on the existing protocol, but I think it would take a
protocol rev (v3) to accomplish this.  Which is a fair bit of work, but
would be quite awesome IMHO.  There have definately been occassions on
which I've wanted to see trace data alongside sampled data in a chart
(something like Figure 1 & 2 in http://lwn.net/Articles/299483/) - to
solve this in a generic way (arbitrary sample-based metrics from PCP,
arbitrary trace-data from systemtap) would be quite powerful.

> * relatively static pmns
> 
>   It would be desirable if PMNS metrics were parametrizable with
>   strings/numbers, so that a PMDA engine could use it to synthesize
>   metrics on demand from a large space.  (Example: have a
>   "kernel-probe" PMNS namespace, parametrized by function name, which
>   returns statistics of that function's execution.  There are too
> many
>   kernel functions, and they vary from host to host enough, so that
>   enumerating them as a static PMNS table would be impractical.)

This one's easier - there are PMDAs that do this already, in particular
the MMV agent and the Linux kernel cgroup metrics are generated on the fly
with the namespace managed by the PMDA rather than by the static files that
were traditionally used in PCP (src/pmdas/mmv in the tree is a complete
example, src/pmdas/sample/src has a few simple examples too).

The other dimension that can be used there is "instances", which can be
dynamic as well, even for static metric definitions - so in your example
you might have a metric which is "number of times function entered" and
the set of instances might be each function name.

> 
> * scalar payloads
> 
>   It seems as though each metric value provided by PMDAs is
>   necessarily a scalar value, as opposed to some structured type. 
> For
>   event tracing, it would be useful to have tuples.  Front-ends could
>   choose the interesting fields to render.  (Example: tracing NFS
>   calls, complete with decoded payloads.)

Its not widely known or used, but there is an "aggregate" metric type
which is basically a blob (with associated length).  You could certainly
take advantage of that (obviously, requires tools to know whats in the
blob and agree on its format with the PMDA in order to make sense of it).
An example there would be the sample.sysinfo metric in src/pmdas/sample.

> * filtering
> 
>   It would be desirable for the apps fetching metric values to
>   communicate a filtering predicate associated with them, perhaps as
>   per pmie rules.  This is to allow the data server daemon to reduce
>   the amount of data sent to the gui frontends.  Perhaps also it
> could
>   use them to inform PMDAs as a form of subscription, and in turn
> they
>   could reduce the amount of data flow.

This is also feasible - there is a pmStore(3) component to the protocol
which allows clients to communicate with the PMDAs.  You could have a
metric which expresses the filtering expression, perhaps as a string,
which client tools could store into and then the PMDA would enact some
different tracing policy.

> * no web-based frontends
> 
>   In our usage, it would be desirable to have some mini pcp-gui that
>   is based on web technologies rather than QT.

I know Mark's spoken of tackling this area in the past ... but not really
an area of interest for myself (my needs covered already).  Could be done,
just needs someone with the itch to scratch at it.

> 
> To what extent could/should PCP be used/extended to cover this space?
> 

Definately doable, the tricky work would all be in coding the PMDA if you
want everything "live" and the client/pmcd/pmda protocol extensions.

If you are more interested in doing retrospective analysis, many tracing
tools (like blktrace for i/o, wireshark/argus/... for net tracing, xperf
on win32, etc - although not systemtap afaik?) - have a mechanism for
storing trace data ondisk.  There's a scriptable API for taking such data
and producing PCP logs from it - so, that might be another avenue of
interest to you guys, perhaps.  Its would be a good place to start with
prototyping too, to get sample- and trace- data together in a PCP archive
and then playback with PCP tools to see what's needed on the client side
to best explore that data.

cheers.

-- 
Nathan

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]