This is the mail archive of the
mailing list for the Mauve project.
Re: Mauve wishlist
Anthony Balkissoon wrote:
On Fri, 2006-03-17 at 11:32 -0500, Thomas Fitzsimmons wrote:I think that would be a backward step (I like the detail that Mauve
provides, especially when testing on subsets while developing on GNU
Anthony Balkissoon has expressed interest in improving Mauve so we'd
like to know what would be the best things to work on.
Another suggestion that Tom Fitzsimmons had was to change the way we
count the number of tests. Counting each invocation of the test()
method rather than each call to harness.check() has two benefits:
On the other hand, you can achieve this result without losing the
current detail - for example, see my recent JUnit patch (not committed
yet) - it effectively gives a pass/fail per test() call when you run via
JUnit, without losing the ability to run in the usual Mauve way
(counting check() results).
1) constant number of tests, regardless of exceptions being thrown orMauve does have a design flaw where it can be tricky to automatically
assign a unique identifier to each check(), and this makes it hard to
compare two Mauve runs (say a test of the latest Classpath CVS vs the
last release, or the Classpath vs JDK 1.5 - both of which would be
which if-else branch is taken
We can work around that by ensuring that all the tests run linearly (no
if-else branches - I've written a large number of tests this way and not
found it to be a limitation, but I don't know what lurks in the depths
of the older Mauve tests).
There is still the problem that an exception being thrown during a test
means some checks don't get run, but a new Mauve comparison report (not
yet developed, although I've done a little experimenting with it) could
2) more realistic number of tests, to accurately reflect the extent ofI think the absolute number is meaningless however you count the tests,
so I don't see this as an advantage. Test coverage reports are what we
need to get some insight into the extent of our testing.
For point 1) this will help us see if we are making progress. Right nowI have done a little bit of work on a comparison report to show the
differences between two runs of the same set of Mauve tests, classifying
them as follows:
a Mauve run might say we have 113 fails out of 13200 tests and then a
later run could say 200 fails out of 34000 tests. Is this an
improvement? Hard to say.
Type 1 (Normal): Passes on run A and B;
Type 2 (Regression): Passes on run A, fails on run B;
Type 3 (Improvement): Fails on run A, passes on run B;
Type 4 (Bad): Fails on run A, fails on run B;
In a comparison of JDK1.5 vs Classpath, Type 4 hints that the check is
buggy. This is a work in progress, and I don't have any code to show
anyone yet, but it is an approach that I think can be made to work.
To make it work, each check has to be uniquely identified - I did this
using the checkpoint and check index within a test(), so here it is
important that if-else branches in the tests can't result in checks
being skipped. This is the case for most of the javax.swing.* tests,
but I can't speak for some of the older Mauve tests.
But if we count each call to test() as aYou'll lose the ability to distinguish between an existing failure where
(say) 1 out of 72 checks fail, and after some clever patch 43 out of 72
checks fail, but the new system reports both as 1 test failure.
test, and also detect hanging tests, then we should have a constant
number of tests in each run and will be able to say if changes made have
a positive impact on Mauve test results.