This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[Bug runtime/20820] another "soft lockup" BUG on RHEL7 ppc64

From: "mcermak at redhat dot com" <sourceware-bugzilla at sourceware dot org>
To: systemtap at sourceware dot org
Date: Thu, 24 Nov 2016 16:06:01 +0000
Subject: [Bug runtime/20820] another "soft lockup" BUG on RHEL7 ppc64
Auto-submitted: auto-generated
References: <bug-20820-6586@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=20820

--- Comment #4 from Martin Cermak <mcermak at redhat dot com> ---
I'll be inexact but terse:  The testcase ensures, that the aggregation operator
'<<<' works faster for stats with only computionally simple operators like e.g.
@count, then for stats with computionally complex operators like @variance. 
For more verbose description, please, refer to [1].


Currently we support 6 stat operators: @count, @sum, @min, @max, @avg, and
@variance.  The optimization in question is based on GCC optimizing the
following inlined function based on its parameters:

=======
$ grep -A 33 __stp_stat_add runtime/stat-common.c
static inline void __stp_stat_add(Hist st, stat_data *sd, int64_t val,
                                  int stat_op_count, int stat_op_sum, int
stat_op_min,
                                  int stat_op_max, int stat_op_variance)
{
        int n;
        int delta = 0;

        sd->shift = st->bit_shift;
        sd->stat_ops = st->stat_ops;
        if (sd->count == 0) {
                sd->count = 1;
                sd->sum = sd->min = sd->max = val;
                sd->avg_s = val << sd->shift;
                sd->_M2 = 0;
        } else {
                if(stat_op_count)
                        sd->count++;
                if(stat_op_sum)
                        sd->sum += val;
                if (stat_op_min && (val > sd->max))
                        sd->max = val;
                if (stat_op_max && (val < sd->min))
                        sd->min = val;
                /*
                 * Below, we use Welford's online algorithm for computing
variance.
                 *
https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
                 */
                if (stat_op_variance) {
                    delta = (val << sd->shift) - sd->avg_s;
                    sd->avg_s += _stp_div64(NULL, delta, sd->count);
                    sd->_M2 += delta * ((val << sd->shift) - sd->avg_s);
                    sd->variance_s = (sd->count < 2) ? -1 : _stp_div64(NULL,
sd->_M2, (sd->count - 1));
                }
        }
$ 
=======

For example, if @variance isn't being used with given stat, stat_op_variance is
set to 0, and GCC is expected to optimize respective computations out.  Looking
at the above code snippet, it's easy to see, that the effect of optimizing the
@variance computations out is much more significant then the effect of
optimizing out the other stat op computations.  Of course, the effect of such
optimizations is also architecture and compiler dependent.

The testcase tries to detect all the optimizations and confirm they are there. 
Detecting optimizations for @count, @sum, @min, @max and @avg is relatively
tricky.  It's hard to distinguish their optimization effect from the noise. 
The test results are of a low quality and the test generates lots of load.  On
the other hand, detecting and verifying the @variance optimization is
relatively simple, testing this makes pretty good sense.

I've been running the testcase right now in its original form, and it gives all
expected passes for most of the rhel 6 and 7 supported arches. But sometimes
'kernel:NMI watchdog: BUG: soft lockup' errors are happening.  However, this
was using the testsuite serial mode, which certainly gives better results then
the parallel mode.

So, I propose to drop the first subtest (optim_stats1.stp) for @count, @sum,
@min, @max optimizations altogether, since it's "not so much fun for a lot of
money",  but to keep the second subtest (optim_stats2.stp) for the @variance
optimization.  Also the high count of iterations in optim_stats2.stp can be
lowered down (the values were copied from optim_stats1.stp, but appear to be
unnecessarily high).  Following seems to help:

=======
$ git diff
diff --git a/testsuite/systemtap.base/optim_stats.exp
b/testsuite/systemtap.base/optim_stats.exp
index e46de40..1955853 100644
--- a/testsuite/systemtap.base/optim_stats.exp
+++ b/testsuite/systemtap.base/optim_stats.exp
@@ -8,7 +8,7 @@ if {![installtest_p]} {
     return
 }

-for {set i 1} {$i <= 2} {incr i} {
+for {set i 2} {$i <= 2} {incr i} {
     foreach runtime [get_runtime_list] {
        if {$runtime != ""} {
            spawn stap --runtime=$runtime -g --suppress-time-limits
$srcdir/$subdir/$test$i.stp
diff --git a/testsuite/systemtap.base/optim_stats2.stp
b/testsuite/systemtap.base/optim_stats2.stp
index 53bbc69..65fe06d 100644
--- a/testsuite/systemtap.base/optim_stats2.stp
+++ b/testsuite/systemtap.base/optim_stats2.stp
@@ -2,9 +2,9 @@
  * Analogy to optim_stats1.stp, but for pmaps.  See optim_stats1.stp for
comments.
  */

-@define RANDCNT %( 200000 %)
+@define RANDCNT %( 2000 %)
 @define RANDMAX %( 1000 %)
-@define ITERS %( 1500 %)
+@define ITERS %( 15 %)

 @define feed(agg, tagg)
 %(
$ 
=======


Thoughts?


------------------------
[1]
https://sourceware.org/git/gitweb.cgi?p=systemtap.git;a=blob;f=testsuite/systemtap.base/optim_stats1.stp;h=2144b7bb210ee8f0c620487ac63fffba14e0d1bf;hb=HEAD

-- 
You are receiving this mail because:
You are the assignee for the bug.

References:
- [Bug runtime/20820] New: another "soft lockup" BUG on RHEL7 ppc64
  - From: dsmith at redhat dot com

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]