Pattern Boldness

13 March 2018

At the moment, mutagen only considers top-level idents in function arguments (e.g.foo(x: X, y: Y)), but function arguments are actually patterns, so we could have foo((x, y): (X, Y)) or bar(Bar { bla, bazz } : Bar). For now, this means we have no type information for either of those examples.

(Note that this only applies to structs for now, as enums cannot be destructured infallibly)

So what can we do? We can try to destructure the patterns, but then what should happen with the types (we need those to check if args are interchangeable)? We need to add the additional information on where in that type the pattern is, that’s what!

Since we have tuples, tuple structs and “normal” structs as well as references, we need three ways of addressing:

/// an (partial) occurrence of a pattern position in the given type
#[derive(Clone, Copy, Eq, Hash, Ord, PartialEq, PartialOrd)]
enum TyOcc {
    Field(Ident),
    Index(usize),
    Deref
}

Now we can encode a pattern’s type+position as (&Ty, Vec<TyOcc>) and compare them. This means we can make a and c as well as b and d in fn bar(Foo(a, b), Foo(c, d)) interchangeable, even if we don’t know what a Foo is. In this case, both would have the type/occ tuple (&Foo, Index(0)) and (&Foo, Index(1)), respectively.

We could extend this scheme to look into structs we see to store their content types, which would allow us to e.g. know that Foo.0 is a Bar, which might be our return type, but our users most likely keep the scope of mutations small, so we don’t normally expect to encounter any types anyway. At least with tuples and slices, we know the constitutent types.

So we walk the patterns and types in lock-step and try our best to keep both in sync, adding TyOccs wherever we fail. Once we have any occurrence marked, we are no longer able to tell anything about the types in play, so we will have to bail there. To understand why this is needed, consider the following function declaration: fn bad(Doodad { &mut badum, bar } = Doodad). If we try to strip the &mut from the type of badum, we fail (Worse, we might succeed in some scenarios, ending up with wrong types). So we best be conservative and give badum the tuple (&Doodad, vec![Field(badum), Deref]).

Even with this simple scheme, we get a fair amount of code just to “walk the walk”. This is the price we who work with the syntax pay to allow you users to write beautiful low-level code. Of course, this would be much easier with languages like Java or Brainfuck (do I sense a strawman argument here?), but I set out to mutate Rust code, I better deal with the complexity. You’re welcome!