This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
[Bug runtime/11308] aggregate operations for @variance, @skew, @kurtosis
- From: "mcermak at redhat dot com" <sourceware-bugzilla at sourceware dot org>
- To: systemtap at sourceware dot org
- Date: Fri, 03 Jun 2016 11:41:28 +0000
- Subject: [Bug runtime/11308] aggregate operations for @variance, @skew, @kurtosis
- Auto-submitted: auto-generated
- References: <bug-11308-6586 at http dot sourceware dot org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=11308
--- Comment #1 from Martin Cermak <mcermak at redhat dot com> ---
Created attachment 9311
--> https://sourceware.org/bugzilla/attachment.cgi?id=9311&action=edit
proposed patch
The variance of N data points is V = S / (N - 1) where S is the sum of squares
of the deviations from the mean. Here is an attempt to implement @variance()
operator using Knuth's algorithm [1]:
=======
def online_variance(data):
n = 0
mean = 0.0
M2 = 0.0
for x in data:
n += 1
delta = x - mean
mean += delta/n
M2 += delta*(x - mean)
if n < 2:
return float('nan')
else:
return M2 / (n - 1)
=======
This patch is based on current systemtap implementation of the aggregation
operators, which first pre-aggregates the data per each CPU (__stp_stat_add()),
and then, when the aggregations are actually being read via e.g. @sum (or
@variance), they are aggregated again, this time across all the CPUs
(_stp_stat_get()) and outputted. This approach saves shared resources at the
collection time. So, in this patch, per cpu variances are being collected
first and then they are being aggregated again across all the CPUs to give the
resulting @variance. The N is assumed to be N >> 1 and so the resulting
@variance() is being computed as a simple mean of per-cpu variances. Integer
arithmetic is being used. With this patch, we get something relatively small
for data points closely spread along the mean, and something relatively big for
data points widely spread along the mean. So it passes a rough sanity test:
=======
# stap -e 'global a probe oneshot { for(i=0; i<1000; i++) { a<<<42 } } probe
end { printdln(", ", @count(a), @max(a), @variance(a)) }'
1000, 42, 1
# stap -e 'global a probe oneshot { for(i=0; i<1000; i++) { a<<<42 } for(i=0;
i<20; i++) { a<<<99 } } probe end { printdln(", ", @count(a), @max(a),
@variance(a)) }'
1020, 99, 65
# stap -e 'global a probe oneshot { for(i=0; i<1000; i++) { a<<<i } } probe
end { printdln(" ", @count(a), @max(a), @variance(a)) }'
1000 999 332833
#
=======
-------------------------------------------
[1]
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm
--
You are receiving this mail because:
You are the assignee for the bug.