Rust Mutation Testing
Not least since last JavaLand I am a fan of mutation testing. However, there is no working crate to do it in Rust yet (There is mutant, but it’s both outdated and far from finished). So because I may cobble the time together to implement it, I’m going to blog the journey to order my thoughts. This post will give a broad overview on the mechanics of mutation testing and the problems we’ll need to overcome to get it in Rust.
First, let’s motivate the whole thing: Why would I want to mutate my code? The question this attempts to answer is: “Am I testing the right thing?” Note that this is different to the question “Am I testing enough?”; we already have line coverage and branch coverage to answer the latter. The problem with those is that they can mislead; just because a line or branch is covered does not mean it is really tested, as a cursory look into mutation testing will show.
With mutation testing, I start with running my tests as usual, and then I mutate the code and run the tests again. If at least one fails, the mutant is “killed”. The mutation is rolled back and another one tried. The resulting metric is how many mutants survived divided by how many mutations were tried. Usually, mutation testing tools show the surviving mutations.
The idea is that if you can put errors in your code without making your tests fail, your tests didn’t sufficiently guard against those errors. Now what are typical mutations:
||and vice versa
- Deleting statements
- Duplicating statements
- Inserting other statements (e.g.
- Replacing boolean subexpressions with
- Replacing comparisons, e.g.
- Replacing variables with others from same scope of compatible types
As you can imagine, this isn’t exactly easy to do in Rust, especially if we want to do it fast enough to be useful. We probably do not want the full compiler within our tool (or even everything down to MIR), because that would mean forking rustc. So going with the abstract syntax tree, how far can we go?
Turns out there’s not too much we can do. The boolean operator switcheroo is
A-OK, also inserting
returns at random points for unit-returning functions
can be a fun way to disrupt control flow. Perhaps we could add default returns
for other return types via some kind of whitelist. The boolean subexpression
replacement also looks good (we can identify those as expressions in
while loops or as subexpressions thereof). Also comparison
operators seem pretty safe. We may be able to do that whitelist dance for other
operations like addition/subtraction (we’ll need to make sure the types have
std::ops::* implementations). If we’re very adventurous, we
might try to clone as much of rust’s typeck as needed to do trait lookup.
My idea is that for every method under test, we can add another argument that
will just be the mutation count.
0 is the original code and every other value
refers to a mutation (up to
LAST_MUTATION, whatever that is). We also store
some metadata about the mutations somewhere so we can refer to them in the
The code is then not mutated directly, but conditioned on the mutation
parameter. So for example, we would change some expressions to depend on
(our “mutation counter”):
|Original Expression||Mutated Expression|
mu::*.. functions would change the operations to
something else, depending on
mc. This means we only compile the mutated code
once and run it once per mutation. Our test runner knows how many mutations
exist and can run them all, possibly in parallel, keeping track of which tests
failed. We will want to know two things:
- Which mutations survived – those will be reported directly
- Which tests killed what mutations. This will be useful to identify tests that can be removed because all the mutations they cover are already covered by other tests. This is the classical minimum set cover problem, which is NP-hard, but good solutions can be found in polynomial time nonetheless (using Linear Programming Relaxation). I am not aware of any tool that does this at the moment, but it would be really neat to have.
Note that the mutation numbers must be different for all methods under test – we only ever introduce one error at a time.
Now my first step will be to look into the current test crate to see how the tests are actually run. Also I’m looking into syntex to see if I can find the tests.