This is the mail archive of the mauve-discuss@sourceware.org mailing list for the Mauve project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Mauve wishlist

From: David Gilbert <david dot gilbert at object-refinery dot com>
To: Anthony Balkissoon <abalkiss at redhat dot com>
Cc: classpath at gnu dot org, mauve-discuss at sources dot redhat dot com
Date: Tue, 21 Mar 2006 17:10:07 +0000
Subject: Re: Mauve wishlist
References: <1142613140.3805.20.camel@rh-ibm-t41> <1142873502.3112.16.camel@tony.toronto.redhat.com>

Hi,

Anthony Balkissoon wrote:

On Fri, 2006-03-17 at 11:32 -0500, Thomas Fitzsimmons wrote:
Hi,
Anthony Balkissoon has expressed interest in improving Mauve so we'd
like to know what would be the best things to work on.
Another suggestion that Tom Fitzsimmons had was to change the way we count the number of tests. Counting each invocation of the test() method rather than each call to harness.check() has two benefits:

I think that would be a backward step (I like the detail that Mauve provides, especially when testing on subsets while developing on GNU Classpath).

On the other hand, you can achieve this result without losing the current detail - for example, see my recent JUnit patch (not committed yet) - it effectively gives a pass/fail per test() call when you run via JUnit, without losing the ability to run in the usual Mauve way (counting check() results).

1) constant number of tests, regardless of exceptions being thrown or which if-else branch is taken

Mauve does have a design flaw where it can be tricky to automatically assign a unique identifier to each check(), and this makes it hard to compare two Mauve runs (say a test of the latest Classpath CVS vs the last release, or the Classpath vs JDK 1.5 - both of which would be interesting).

We can work around that by ensuring that all the tests run linearly (no if-else branches - I've written a large number of tests this way and not found it to be a limitation, but I don't know what lurks in the depths of the older Mauve tests).

There is still the problem that an exception being thrown during a test means some checks don't get run, but a new Mauve comparison report (not yet developed, although I've done a little experimenting with it) could highlight those.

2) more realistic number of tests, to accurately reflect the extent of our testing

I think the absolute number is meaningless however you count the tests, so I don't see this as an advantage. Test coverage reports are what we need to get some insight into the extent of our testing.

For point 1) this will help us see if we are making progress. Right now a Mauve run might say we have 113 fails out of 13200 tests and then a later run could say 200 fails out of 34000 tests. Is this an improvement? Hard to say.

I have done a little bit of work on a comparison report to show the differences between two runs of the same set of Mauve tests, classifying them as follows:

Type 1 (Normal):  Passes on run A and B;
Type 2 (Regression):   Passes on run A, fails on run B;
Type 3 (Improvement):  Fails on run A, passes on run B;
Type 4 (Bad): Fails on run A, fails on run B;

In a comparison of JDK1.5 vs Classpath, Type 4 hints that the check is buggy. This is a work in progress, and I don't have any code to show anyone yet, but it is an approach that I think can be made to work.

To make it work, each check has to be uniquely identified - I did this using the checkpoint and check index within a test(), so here it is important that if-else branches in the tests can't result in checks being skipped. This is the case for most of the javax.swing.* tests, but I can't speak for some of the older Mauve tests.

But if we count each call to test() as a test, and also detect hanging tests, then we should have a constant number of tests in each run and will be able to say if changes made have a positive impact on Mauve test results.

You'll lose the ability to distinguish between an existing failure where (say) 1 out of 72 checks fail, and after some clever patch 43 out of 72 checks fail, but the new system reports both as 1 test failure.

Regards,

Dave

Follow-Ups:
- Re: Mauve wishlist
  - From: Tom Tromey
- Re: Mauve wishlist
  - From: Bryce McKinlay

References:
- Mauve wishlist
  - From: Thomas Fitzsimmons
- Re: Mauve wishlist
  - From: Anthony Balkissoon

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]