This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: systemtap/pcp integration pmda 0.1

From: fche at redhat dot com (Frank Ch. Eigler)
To: David Smith <dsmith at redhat dot com>
Cc: Systemtap List <systemtap at sourceware dot org>, pcp <pcp at oss dot sgi dot com>
Date: Mon, 22 Sep 2014 18:04:57 -0400
Subject: Re: systemtap/pcp integration pmda 0.1
Authentication-results: sourceware.org; auth=none
References: <54133D71 dot 6040208 at redhat dot com>

dsmith wrote:

> Here's version 0.1 (up from 0.01!) of my systemtap/pcp integration work

Thanks a lot!

> that uses systemtap (https://sourceware.org/systemtap/) to export JSON
> data and a pcp (http://www.performancecopilot.org/) python pmda that
> reads and processes the JSON data. [...]

(Sorry I missed this when it went by -- please call the next
version 3.14 or something else very different from 0.1! :-)

> # pminfo -df stap_json
>
> stap_json.json.dummy2
>     Data Type: string  InDom: PM_INDOM_NULL 0xffffffff
>     Semantics: instant  Units: none
>     value "dummy2"
>
> stap_json.json.dummy_array.dummy2
>     Data Type: string  InDom: 130.1 0x20800001
>     Semantics: instant  Units: none
>     inst [0 or "1"] value "def"
>     inst [1 or "0"] value "abc"
>     inst [2 or "2"] value "ghi"
> [...]

Looking good!

> // ===========================
> // ==== Tapset file start ====
> // ===========================
>

Looks long but generally good.  (But see below re. suggestions about
the schema/data representations.)

> global net_devices
> global read_count
>
> probe json_data
> {
>   @json_output_data_start
>   @json_output_string_value("xstring", "testing, 1, 2, 3")
>   @json_output_numeric_value("read_count", read_count)
>   read_count++
>
>   foreach (dev in net_devices) {
>     if (@count(skb_queue_t[dev])) {
>       @json_output_array_numeric_value("net_xmit_data", dev, "xmit_count",
> 				       @sum(skb_queue_t[dev]))
>       @json_output_array_numeric_value("net_xmit_data", dev, "xmit_latency",
> 				       @count(skb_queue_t[dev]))
> [...]
>   @json_output_data_end
> }
>
> // Set up the metrics
> probe begin
> {
> [...]
>   json_add_string_metric("xstring", "Test string")
>   json_add_numeric_metric("read_count", "Times values read")
> [...]
> }

Have you considered merging together these two bits of code, so that a
single stap probe alias that generates json data is also used to
populate metadata globals, so a subsequent (!) schema json query would
be possible?  Something like

probe json_data {
   @json_output_string_value("xstring", "testing 1, 2, 3", "Test String")
   @json_output_array_numeric_value("net_xmit_data", dev, "xmit_count", 3,
                         "sum of latency for xmit device", $UNITS/SCALE)
}

so that the metadata is attached at the end of the data-supplying calls?

> #!/usr/bin/python
> import json
> import jsonschema
> import collections
> [...]

This PMDA code looks quite manageable; in particular the cumulative
nature of imdom changes looks right on.  Please don't shoot me as I'll
suggest a somewhat different schema/data encoding below; I'm pretty
sure they're not too hard to express in the python code.  The main
reason for proposing the changes is so that this pmda has a fighting
chance at consuming json data from sources other than just the above
systemtap tapset.

> {
>   "generation": 1,
>   "data": {

(IMHO we shouldn't mandate such wrappers.)

>     "xstring": "testing, 1, 2, 3",
>     "read_count": 9,
>     "net_xmit_data": [
>       {
>         "__id": "eth0",
>         "xmit_count": 7699136,
>         "xmit_latency": 1109
>       },

(We already mentioned in passing how the "__id" string might be
desirable to be schema-configured.)

Anyway, onto the schema.  

I see how you chose the json-schema.org to piggyback-ride on.  One
thing we should keep in mind though that json-schema does a slightly
different thing than what we need.  It's more like an XML DTD, and
just describes what's a "grammatically correct" document.  We do not
really need this exact kind of checking, but it's not a big hindrance
either - it's not grossly wordy.  (The "additionalProperties=false"
might be an example of unhelpful wordiness though.)

What we really need is the interpretation, for purposes of extracting
data and relaying to PCP.  And for that, we can be a little more
aggressive in the sense of adding our own schema elements, rather than
riding on top of json-schema.org patterns.  For example:

> {
>   "type": "object",
>   "title": "root",
> [...]
>   "properties": {
> [...]
>     "data": {
> [...]
>       "properties": {
>         "xstring": {
>           "type": "string",
>           "description": "Test string",
>           "additionalProperties": false
>         },
> [...]

The prototype PMDA turns this into metric "json.xstring" of pcp
PM_TYPE_STRING.  The heuristic's probably fine, but if we want
more generality, we could as well do something like this to
describe a scalar:

{ "foo": { "xstring":
     { "pcp-name":"foo.bar.xstring",
       "pcp-type":"string",          // esp. if pmda offers to cast
       "pcp-units":"MBytes/sec",     // need an inverter for pmUnitsStr(3)
       "pcp-semantics":"instant",    // need an inverter for PM_SEM_*
       "pcp-shorthelp":"short help", 
       "pcp-longhelp":"long help" 
     }
} }

(Adding the json-schema fields is optional & orthogonal.)

One benefit of a formal "pcp-name" field here is that the mapping from
the JSON nesting structure need not match the pcp namespace exactly.
It would let the json object name components be free of constraints
like not containing dots (since we would not propagate them to pcp).

>         "net_xmit_data": {
>           "type": "array",
>           "description": "Network transmit data indexed by ethernet device",
>           "additionalProperties": false,
>           "items": {
>             "type": "object",
>             "additionalProperties": false,
>             "properties": {
>               "__id": {
>                 "type": "string",
>                 "additionalProperties": false
>               },
>               "xmit_count": {
>                 "type": "integer",
>                 "description": "number of packets for xmit device",
>                 "additionalProperties": false
>               },
> [...]

Here's an alternative formulation kind of along the previous one:

{ "bar": { "networks":
     { "pcp-imdom-discriminator":"__id",  // parametrizing this
       "type":"array",   // json-schema style identification of array-ness
       "items": {
          "xmit": {
            "pcp-name":"bar.xmitfoo",
            "pcp-type":"float",           // (stap can print fp with some effort)
            "pcp-units":"Bytes/hour",
            "pcp-semantics":"instant",
            "pcp-shorthelp":"short help", 
            "pcp-longhelp":"long help" 
            }
      } }
} }

Again, the json-schema parts are mostly orthogonal (just kiting the
array-ness description).

So what would something like this give us?  At the pure stap-pmda
level, not that much extra over what the 0.1 prototype has.  But
beyond stap, we may well be able to write some schema for more general
json files, trivially e.g. ones that lack the "data" as top-level
wrapper.

Before I go and write a bigger example schema of some other JSON data
we have lying around, do you see what I'm getting at?  Do you agree
that this style is also implementable in the python pmda?

- FChE

Follow-Ups:
- Re: systemtap/pcp integration pmda 0.1
  - From: David Smith

References:
- systemtap/pcp integration pmda 0.1
  - From: David Smith

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]