Arraigning a Statement, vol. 2

14 March 2019

Last time we defined a minimum viable implementation for mutagen statement removal: Remove only function call statements whose results are not returned from the surrounding block and whose AST do not contain any Assign expressions.

Those predicates are easy enough to check (the former by leaving out the last statement unless it has a semicolon, the latter by a custom visitor) and can at least remove a small number of statements. But can we do better? We can.

Before we start, a bit about terminology: We call statements like let mut x bindings and expressions like a = 5 assignments. If the let binding has no initialization clause (as in let b;), we speak of deferred initialization, which is a great way to deal with some programs, but brings some complexity to those who wish to do statement elimination mutations.

First, about that Assign expressions: If the block contains a let binding for the same identifier before the assignment, we can still remove the statement, because the assignment won’t be visible outside the block. So we need to extend our visitor to store a set of bound identifiers and only perform the mutation if each assigned identifier is already bound. This has to be chained across blocks, too.

So let’s look at a few examples we need to correctly handle. The following snippets show only a block of Rust code without context, since that is what the analysis also sees:

{
    let mut a = 5 // a is bound within this block from here on
    {
        a = 4; // this is OK, because we have the binding from outside
        let mut b = (3, 6);
        b.1 = 2; // this  is also OK.
    }
}

In this example, we only assign to identifiers bound within the block. So the set of assigned outer identifiers is empty.

{
    b = 4;
    let mut b = 3; // this binding does not absolve the assignment above
}

Here we assign to b before creating a new binding with the same name. Rust allows shadowing, so this is potentially valid code (if there is a let b before the block). The set of assigned outer identifiers is {b}, therefore we must not remove the assignment, because that might leave b uninitialized if it wasn’t initialized by the time the block started.

The following example is basically the same thing, but with block scope limiting the reach of a binding:

{
    let mut b = 6;
}
b = 7; // this assigns `b` outside the inner block's scope

Given that the number of identifiers for most expressions will have one or two digits, a Vec that will be truncated to the old length within a block should offer acceptable performance. If this gets too slow, an immutable set can help.

(Aside: I once did something very similar for clippy)

Now there’s another niggle that we need to keep in mind during our analysis: We need to keep control flow intact, lest changing the code paths lead to uncompilable code. For example, removing the following might break compilation:

a(return b);

So we have to bail if we encounter any of break, return or continue. Without a control flow graph and some careful analysis, we cannot be sure if removing those keeps the build intact.

Next thing, are there other statements we can remove? One obvious variant would be statements containing AssignOps whose right-hand-side argument follow the same above analysis (we can be sure that the left-hand-side is already assigned, because otherwise the AssignOp would be operating on an undefined value). Similarly, we could remove let _ = bindings. We could remove arrays and tuples of exprs (though those are unusual as statements) if all contents follow our above rule. We could even remove a match if no values escape it as per our rule.

Stay tuned for further entries in the series.