This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Improve analysis of racy testcases


On Thursday, February 25 2016, Pedro Alves wrote:

> On 02/17/2016 01:26 AM, Sergio Durigan Junior wrote:
>> This patch is a proposal to introduce some mechanisms to identify racy
>> testcases present in our testsuite.  As can be seen in previous
>> discussions, racy tests are really bothersome and cause our BuildBot to
>> pollute the gdb-testers mailing list with hundreds of false-positives
>> messages every month.  Hopefully, by identifying these racy tests in
>> advance (and automatically) will contribute to the reduction of noise
>> traffic to gdb-testers, maybe to the point where we will be able to send
>> the failure messages directly to the authors of the commits.
>> 
>> I spent some time trying to decide the best way to tackle this problem,
>> and decided that there is no silver bullet.  Racy tests are tricky and
>> it is difficult to catch them, so the best solution I could find (for
>> now?) is to run our testsuite a number of times in a row, and then
>> compare the results (i.e., the gdb.sum files generated during each run).
>> The more times you run the tests, the more racy tests you are likely to
>> detect (at the expense of waiting longer and longer).  You can also run
>> the tests in parallel, which makes things faster (and contribute to
>> catching more racy tests, because your machine will have less resources
>> for each test and some of them are likely to fail when this happens).  I
>> did some tests in my machine (8-core i7, 16GB RAM), and running the
>> whole GDB testsuite 5 times using -j6 took 23 minutes.  Not bad.
>> 
>> In order to run the racy test machinery, you need to specify the
>> RACY_ITER environment variable.  You will assign a number to this
>> variable, which represents the number of times you want to run the
>> tests.  So, for example, if you want to run the whole testsuite 3 times
>> in parallel (using 2 cores), you will do:
>> 
>>   make check RACY_ITER=3 -j2
>> 
>> It is also possible to use the TESTS variable and specify which tests
>> you want to run:
>> 
>>   make check TEST='gdb.base/default.exp' RACY_ITER=3 -j2
>
> I did this, and noticed:
>
> $ ls testsuite/racy_outputs/
> gdb1.log  gdb1.sum  gdb2.log  gdb2.sum  gdb3.log  gdb3.sum
>
> I think it'd make more sense, and it'd be easier to copy
> test run results around, diff result dirs etc., if these
> were put in subdirs instead:
>
>  testsuite/racy_outputs/1/gdb.sum
>  testsuite/racy_outputs/1/gdb.sum
>  testsuite/race_outputs/1/gdb.log
>  testsuite/race_outputs/2/gdb.sum
>  testsuite/race_outputs/2/gdb.log
>
> etc.  Any reason not to?

Not really.  I think it's a good idea, so I modified the Makefile to
implement it.

>> And so on.  The output files will be put at the directory
>> gdb/testsuite/racy_outputs/.
>> 
>> After make invokes the necessary rules to run the tests, it finally runs
>> a Python script that will analyze the resulting gdb.sum files.  This
>> Python script will read each file, and construct a series of sets based
>> on the results of the tests (one set for FAIL's, one for PASS'es, one
>> for KFAIL's, etc.).  It will then do some set operations and come up
>> with a list of unique, sorted testcases that are racy.  The algorithm
>> behind this is:
>> 
>>   for state in PASS, FAIL, XFAIL, XPASS...; do
>>     if a test's state in every sumfile is $state; then
>>       it is not racy
>>     else
>>       it is racy
>> 
>> IOW, a test must have the same state in every sumfile.
>
> Is it easy to start with preexisting dirs with gdb.sum/gdb.log
> files, and run this script on them?

Yes, you just need to provide two or more .sum files to the script.

> E.g., say I've already run the testsuite three times, and now
> I want to check for racy tests, using the test results I already
> have:
>
>  testruns/2015-02-21/1/gdb.sum
>  testruns/2015-02-21/2/gdb.sum
>  testruns/2015-02-21/3/gdb.sum
>
> I think we should document that.

Good idea.  I will write more about how to use the script if you already
have the .sum files.  I assume that gdb/testsuite/README is a good place
to put this information.

>> After processing everything, the script prints the racy tests it could
>> identify on stdout.  I am redirecting this to a file named racy.sum.
>
> I tried:
>  make check TESTS='gdb.base/default.exp' RACY_ITER=3
>
> and was surprised to see nothing output at the end.  Then I found that
> the racy.sum file is completely empty.  How about we instead make sure
> that something is already printed?  Maybe follow dejagnu's output style,
> and print a summary at the end?

Alright, sounds doable.

> I've now run "make check -j8 RACY_ITER=3" and got this:
>
> $ cat testsuite/racy.sum
> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 1), pending signal catch
> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 1: attach (pass 2), pending signal catch
> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 4: attach (pass 1), pending signal catch
> gdb.threads/attach-into-signal.exp: nonthreaded: attempt 6: attach (pass 2), pending signal catch
> gdb.threads/attach-into-signal.exp: threaded: attempt 1: attach (pass 1), pending signal catch
> gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 1), pending signal catch
> gdb.threads/attach-into-signal.exp: threaded: attempt 3: attach (pass 2), pending signal catch
> gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach
> gdb.threads/attach-many-short-lived-threads.exp: iter 5: attach (EPERM)
> gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach
> gdb.threads/attach-many-short-lived-threads.exp: iter 9: attach (EPERM)
> gdb.threads/fork-plus-threads.exp: detach-on-fork=off: only inferior 1 left
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=0: no threads left
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: inferior 1 exited (memory error) (PRMS: gdb/18749)
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=off: cond_bp_target=1: no threads left
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: inferior 1 exited (prompt) (PRMS: gdb/18749)
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=0: no threads left
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (timeout) (PRMS: gdb/18749)
> gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: no threads left
> gdb.threads/watchpoint-fork.exp: child: multithreaded: finish
> gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint A after the second fork
> gdb.threads/watchpoint-fork.exp: child: multithreaded: watchpoint B after the second fork
>
> The gdb.threads/process-dies-while-handling-bp.exp ones are actually:
>
>  -PASS: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited
>  +KFAIL: gdb.threads/process-dies-while-handling-bp.exp: non_stop=on: cond_bp_target=1: inferior 1 exited (prompt) (PRMS: gdb/18749)
>
> Test sum diffing should probably strip away tail ()s, and ignore PASS->KFAIL->PASS.

I thought about stripping tail ()s away before comparing the names, but
the problem is that maybe we'll miss a test that actually writes
something meaningful inside the parentheses.  There isn't a strong
convention advising testcase writers to not do that.  What do you think?

As for the PASS->KFAIL->PASS, that's a good idea indeed.  I'll implement
that.

>> 
>> Something else that I wasn't sure how to deal with was non-unique
>> messages in our testsuite.  I decided to do the same thing I do in our
>> BuildBot: include a unique identifier in the end of message, like:
>> 
>>   gdb.base/xyz.exp: non-unique message
>>   gdb.base/xyz.exp: non-unique message <<2>>
>>   
>> This means that you will have to be careful about them when you use the
>> racy.sum file.
>> 
>> I ran the script several times here, and it did a good job catching some
>> well-known racy tests.  Overall, I am satisfied with this approach and I
>> think it will be helpful to have it upstream'ed. 
>
> I think so too.  We don't have to fix everything before its put in either.
> And having this doesn't preclude exploring other approaches either.
>
> The only thing I do wish we should do, is use the fruits of this
> to somehow mark racy tests in the testsuite itself, instead of only
> making the buildbot ignore them, so that local development benefits
> as well.

I totally agree, and also spent some time thinking about this problem,
but I don't see an easy solution for that.  Racy testcases vary wildly
between targets, GDB vs. gdbserver, CFLAGS, etc.  We would have to
maintain several lists of racy tests, and even this wouldn't be 100%
trustworthy because the racy tests also vary depending on the machine
load...

I'm eager to see what you/others think about this specific problem.

>> I also intend to
>> extend our BuildBot and create new, specialized builders that will be
>> responsible for detecting the racy tests every X number of days, but
>> that will only be done when this patch is accepted.
>
> That'll be great.
>
>> Comments are welcome, as usual.  Thanks.
>
> On 02/17/2016 01:26 AM, Sergio Durigan Junior wrote:
>> diff --git a/gdb/testsuite/Makefile.in b/gdb/testsuite/Makefile.in
>> index 38c3052..4bf6191 100644
>> --- a/gdb/testsuite/Makefile.in
>> +++ b/gdb/testsuite/Makefile.in
>> @@ -52,6 +52,8 @@ RUNTESTFLAGS =
>>
>>  FORCE_PARALLEL =
>>
>> +DEFAULT_RACY_ITER = 3
>
> Are users supposed to be able to trigger the default?  I didn't
> see it documented.

Only if they invoke the rule by hand without specifying RACY_ITER, like:

  make check-racy-parallel -j4

In this case, DEFAULT_RACY_ITER would be used.  And yes, I did not
document it, but I should.  Thanks for mentioning.

>> +Racy testcases
>> +**************
>> +
>> +Unfortunately, GDB has a number of testcases that can randomly pass or
>> +fail.  We call them "racy testcases", and they can be bothersome when
>> +one is comparing different testsuite runs.  In order to help
>
> I'd suggest future-proofing this, by rephrasing it as if we were at a
> stage where the testsuite is stable, and new racy tests occasionally are
> added to the testsuite.

Gotcha.

>> diff --git a/gdb/testsuite/analyze-racy-logs.py b/gdb/testsuite/analyze-racy-logs.py
>> new file mode 100755
>> index 0000000..4f05185
>> --- /dev/null
>> +++ b/gdb/testsuite/analyze-racy-logs.py
>> @@ -0,0 +1,131 @@
>> +#!/usr/bin/env python
>> +
>> +# Copyright (C) 2016 Free Software Foundation, Inc.
>> +#
>> +# This file is part of GDB.
>> +#
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 3 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> +
>> +
>> +# This program is used to analyze the test results (i.e., *.sum files)
>> +# generated by GDB's testsuite, and print the testcases that are found
>> +# to be racy.
>> +#
>> +# Racy testcases are considered as being testcases which can
>> +# intermitently FAIL (or PASS) when run two or more times
>
> intermittently

Ops, thanks.

> I think you should put this in as is, and we can keep working on
> it on top.

I'll send a v2 in a few minutes with your suggestions addressed, and
then I'll push it if you're OK with it.

Thanks a lot!

-- 
Sergio
GPG key ID: 237A 54B1 0287 28BF 00EF  31F4 D0EB 7628 65FC 5E36
Please send encrypted e-mail if possible
http://sergiodj.net/


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]