Arraigning a Statement, vol. 2
Last time we defined a minimum viable implementation for mutagen statement
removal: Remove only function call statements whose results are not returned
from the surrounding block and whose AST do not contain any Assign
expressions.
Those predicates are easy enough to check (the former by leaving out the last statement unless it has a semicolon, the latter by a custom visitor) and can at least remove a small number of statements. But can we do better? We can.
Before we start, a bit about terminology: We call statements like let mut x
bindings and expressions like a = 5
assignments. If the let
binding has
no initialization clause (as in let b;
), we speak of deferred
initialization, which is a great way to deal with some programs, but brings
some complexity to those who wish to do statement elimination mutations.
First, about that Assign
expressions: If the block contains a let
binding
for the same identifier before the assignment, we can still remove the
statement, because the assignment won’t be visible outside the block. So we
need to extend our visitor to store a set of bound identifiers and only perform
the mutation if each assigned identifier is already bound. This has to be
chained across blocks, too.
So let’s look at a few examples we need to correctly handle. The following snippets show only a block of Rust code without context, since that is what the analysis also sees:
{
let mut a = 5 // a is bound within this block from here on
{
a = 4; // this is OK, because we have the binding from outside
let mut b = (3, 6);
b.1 = 2; // this is also OK.
}
}
In this example, we only assign to identifiers bound within the block. So the set of assigned outer identifiers is empty.
{
b = 4;
let mut b = 3; // this binding does not absolve the assignment above
}
Here we assign to b
before creating a new binding with the same name. Rust
allows shadowing, so this is potentially valid code (if there is a let b
before the block). The set of assigned outer identifiers is {b
}, therefore we
must not remove the assignment, because that might leave b
uninitialized if
it wasn’t initialized by the time the block started.
The following example is basically the same thing, but with block scope limiting the reach of a binding:
{
let mut b = 6;
}
b = 7; // this assigns `b` outside the inner block's scope
Given that the number of identifiers for most expressions will have one or two
digits, a Vec
that will be truncated to the old length within a block should
offer acceptable performance. If this gets too slow, an immutable set can help.
(Aside: I once did something very similar for clippy)
Now there’s another niggle that we need to keep in mind during our analysis: We need to keep control flow intact, lest changing the code paths lead to uncompilable code. For example, removing the following might break compilation:
a(return b);
So we have to bail if we encounter any of break
, return
or continue
.
Without a control flow graph and some careful analysis, we cannot be sure if
removing those keeps the build intact.
Next thing, are there other statements we can remove? One obvious variant would
be statements containing AssignOps
whose right-hand-side argument follow the
same above analysis (we can be sure that the left-hand-side is already
assigned, because otherwise the AssignOp
would be operating on an undefined
value). Similarly, we could remove let _ =
bindings. We could remove arrays
and tuples of exprs (though those are unusual as statements) if all contents
follow our above rule. We could even remove a match if no values escape it as
per our rule.
Stay tuned for further entries in the series.