This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [pcp] suitability of PCP for event tracing

From: "Frank Ch. Eigler" <fche at redhat dot com>
To: Ken McDonell <kenj at internode dot on dot net>
Cc: pcp at oss dot sgi dot com, systemtap at sources dot redhat dot com
Date: Mon, 13 Sep 2010 16:39:39 -0400
Subject: Re: [pcp] suitability of PCP for event tracing
References: <20100827153906.GD3185@redhat.com> <4C7A7DFE.2040606@internode.on.net> <20100831194941.GC5762@redhat.com> <4C8D0317.6000807@internode.on.net>

Hi, Ken -


> >(2) protocol extensions for live-push on pmda and pmcd-client interfaces
> >     This clearly larger effort is only worth undertaking with the
> >     community's sympathy and assistance.  It might have some
> >     interesting integration possibilities with the other tools,
> >     espectially pmie (the inference engine).
> 
> I'd like to go back to a comment Nathan made at the start of this 
> thread, namely to try and get a clear idea of the problem we're trying 
> to solve here and the typical use cases.  [...]

I guess the basic idea is to allow a single client tool to be able to
draw & analyze both gross performance metrics, as well as the
underlying events that explain those metrics.


> Some of the suggestions to date include ...
> 
> + being able to push data from pmcd asynchronously to clients, as 
> opposed to the time-based pulling from the clients that we support today

Yes:

> [later:] Depending on the set of goals we agree on, there may even
> be a place to consider maintaining the poll-based mechanism, but the
> export data is a variable length buffer of all event records (each
> aggregated and self-identifying as below) seen since the last
> poll-based sample. [...]

As Max says, this would seem to require keeping some client state and
buffers in pmcd and/or pmda, to avoid missing events between
consecutive calls.

Instead of that, I'm starting to sketch out a hybrid scheme that, on
the pmapi side, is represented like this.  (Please excuse the
inclusion of actual code.  It makes things more concrete and easier to
discuss.)


------------------------------------------------------------------------
/*
 * Callback function from pmWatch(), supplying zero or more pmResult rows
 * accumulated during this pmWatch() interval.  The first argument gives
 * number of pmResults in the second argument.  The third argument is
 * a generic data pointer passed through from pmWatch().
 *
 * The function should not call pmFreeResult() on the incoming values.
 * The function may return 0 to indicate its desire to continue watching,
 * or a non-zero value to abort the watch.  This value will be returned
 * from pmWatch.
 */
typedef int (*pmWatchCallBack)(int resCount, const pmResult ** results,
                               void * data);

/*
 * Fetch metrics periodically, as if pmFetch() was called at the given
 * poll interval (if any).  First few parameters are as for pmFetch().
 * Each pmFetch() result is supplied via the given callback function.
 * The callback function can consume the data, and return a value
 * to dictate whether the polling loop is to continue or stop.
 *
 * In addition, if a PMDA pushes discrete metric updates during this
 * watch period, the callback function will be invoked more frequently.
 * (Other metric slots will have a NULL pmResult->vset[].)
 *
 * If given, approximately every poll interval, the callback function
 * is called (possibly with a zero resCount) to give the application a
 * chance to quit the loop.
 */
extern int pmWatch(int, pmID *,
                   pmWatchCallBack fn, void * data,
                   const struct timeval *pollInterval,
                   const struct timeval *timeoutInterval);
------------------------------------------------------------------------

So a pmapi client would make a single long-duration pmWatch call to
libpcp.  libpcp calls back into the application periodically (to poll
normal metric values) or whenever discrete events arrive.  Eventually
the app says "enough" by returning the appropriate rc.

At the pmda.h or PDU side, I don't have a corresponding sketch yet.  I
wonder if we could permit multithreading just for the corresponding
parts of the API:

    pmcd->pmda     (*pmdaInterface.version.five.watch)(..., callbackFn, cbKey, ...);
                   # pmda spawns a new thread, sets it up
                   =>  key (thread-id)
               
    pmda thread2   (*callbackFn) (n, "event data pmResult" [array], cbKey, ...)

    pmcd->pmda (*pmdaInterface.version.five.unwatch)(key);
                   # pmda kills thread2
                   => void

to register an interest in metrics with the PMDA, have a new thread
threads call back into PMCD only to supply new data via a dedicated
function, then eventually unregister.  This may require only
relatively small parts of libpcp/pcp_pdma to be made thread-safe.


> + data filtering predicates pushed from a client to pmcd and then on to
> a pmda to enable or restrict the types of events or conditions on event
> parameters that would be evaluated before asynchronously sending
> matching events to the client

Right.  This would represent a pure performance optimization if there
were only a single concurrent client.  With more than one, a filtering
algebra would be needed.  I don't have a sketch for this yet.


> + handling event records with associated event parameters as an extended
> data type

Right.  Hiding JSON or somesuch in a string is probably OK, unless we
want to reify filtering and inferencing upon them.


> + additional per-client state data being held in pmcd to allow rate
> aggregation (and similar temporal averaging) to be done at pmcd, rather
> than the client [note I have a long-standing objection [...]

I guess it depends on what we could be saving by having pmcd perform
such conversions instead of clients.  Client-side CPU and storage
seems cheaper than network traffic, if the data reduction is moderate,
but if it's high, it's the probably other way.  (In the systemtap
model, we encourage users to filter events aggressively at the source,
which turns the data firehose into a dribble.  To exploit this fully
in the pcp-intermediated world though, we'd have to pass filtering
parameters through.)


> + better support for web-based monitoring tools (although Javascript
> evolution may make this less pressing that it was 5 years ago)

Right, at this point it seems like a fatter javascript app should be
able to do this job without pmcd help; the web app just needs to
access the pmapi (through a proxy if necessary).


> + better support for analysis that spans the timeline between the
> current and the recent past

This sounds like useful but future work.  Until it is done, we could
have clients perform archive-vs-live data merging on their own, or
else have the users start clients early enough to absorb the "recent
past" data as live.


> Returning to Frank's point, I'm not sure pmie would be able to consume
> asynchronous events ... [...]

That's OK, it should at worst ignore such events.  At best, in the
future, it could gain some more general temporal/reactive-database
type facilities to do something meaningful.


- FChE

References:
- suitability of PCP for event tracing
  - From: Frank Ch. Eigler
- Re: [pcp] suitability of PCP for event tracing
  - From: Frank Ch. Eigler
- Re: [pcp] suitability of PCP for event tracing
  - From: Ken McDonell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]