This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

racy tests


On 07/11/2015 02:58 PM, Doug Evans wrote:
> On Fri, Jul 10, 2015 at 9:33 AM, Pedro Alves <palves@redhat.com> wrote:
>> There's no single "base" run, actually.  The baseline is dynamically
>> adjusted at each build; it's a moving baseline, and it's per
>> test (single PASS/FAIL, not file).  As soon as a test PASSes, it's
>> added to the baseline.  That means that if some test is racy, it'll
>> sometimes FAIL, and then a few builds later it'll PASS, at which point
>> the PASS is recorded in the baseline, and then a few builds again
>> later, the test FAIL again, and so the buildbot email report mentions
>> the regression against the baseline.  In sum, if a test goes
>> FAIL -> PASS -> FAIL -> PASS on and on over builds, you'll constantly
>> get reports of regressions against the baseline for that racy test.
> 
> Time for another plug to change how we manage racy tests?

I'm all for something more structured.

> E.g., if a test fails, run it again a few times.

Agreed.

> I can think of various things to do after that.
> E.g. if any of the additional runs of the test record a PASS then flag
> the test as RACY, and remember this state for the next run, and rerun
> the same test multiple times in the next run. If the next time all N
> runs pass (or all N runs fail) then switch its state to PASS/FAIL.
> That's not perfect, it's hard to be perfect with racy tests. One can
> build on that, but there's a pragmatic tradeoff here between being too
> complex and not doing anything at all.
> I think we should do something. The above keeps the baseline
> machine-generated and does minimal work to manage racy tests. A lot of
> racy tests get exposed during these additional runs for me because I
> don't run them in parallel and thus the system is under less load, and
> it's system load that triggers a lot of the racyness.
> 

One thing that I'd like is for this to be part of the testsuite
itself, rather than separate machinery the buildbot uses.  That way,
everyone benefits from it, and so that we all maintain/evolve it.
I think this is important, because people are often confused that
they do a test run before patch, apply patch, run test, and see
confusing new FAILs their patch can't explain.

E.g., we could have the testsuite machinery itself run the tests
multiple times, iff they failed.  May all tests would be eligible for
this, or maybe we'd run apply this to those which are explicitly
marked racy somehow, but that's separate policy from the framework
that actually re-runs tests.  On a parallel test run, we run
each .exp under its own separate runtest invocation, driven from
the testsuite's Makefile; we could wrap each of those invocation and
check whether it failed, and if so, rerun that exp a few times.

That may mean that only parallel mode supports this, but I'd be
myself fine with that, because we can always do

  make check -j1 FORCE_PARALLEL="1"

or some convenience for that, to get the benefits.

Maybe it's possible to restart the same .exp test in a
sequential run too, from gdb_finish, say; I haven't thought much
about that.

> The ultimate goal is of course to remove racy tests, but first we need
> to be more systematic in identifying them, which is one of the goals
> of this process.

Agreed.

Thanks,
Pedro Alves


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]