Minimum Safe Abstractions

01 August 2019

Recently, there’s been a lot of talk about unsafe in the Rust world, and how to deal with it. Let’s recap: Rust has a subset called “safe Rust”, with a few very neat guarantees, such as memory safety and freedom from data races. The superset that completes the language is called “unsafe Rust”, and it still has a number of cool safeguards, but it also has an escape hatch to allow bending a few of them in order to let us write safe abstractions on top in much the same language.

The keyword to open up this can of worms is unsafe, but this is not the perimeter you need to secure. An unsafe statement means that the programmer has taken it upon themselves to uphold the required invariants for memory safety and will write the code in a way so that users calling into safe functions cannot ever (whether in good or bad faith) undermine the safety. Otherwise the code must be deemed unsound.

For a small-ish example, in compact_arena, I use the type system to encode the invariant that indices given out by arena.add(_) are only ever usable with the same arena. This is done by binding an invariant lifetime to each index and the arena and requiring that same lifetime on indexing so that we can safely use unchecked indexing.

This shows a common pattern with unsafe: We are able to relax some requirements (such as “no unchecked indexing”) as long as certain invariants (“no invalid index”) can be ensured. In this case, we have three moving parts:

The mk_arena macro ensures that the invariant lifetime given to SmallArena::new is unique (note that this is a compile-time invariant)
The add function ensures that the resulting indices are valid for this arena
The index function uses those two facts to allow quick unchecked indexing

Actually, there is a fourth part: The absence of any function that invalidates an existing index. Adding such a function would make the code unsound.

Returning to the perimeter of unsafety, it is generally agreed that it is the module boundary. Privacy and coherence ensure that we can stop users from reaching into our module and modifying internals, so this is the only practical definition. In practice, there may be uses of unsafe that rely only on invariants the containing function can already ensure. We currently lack the tools to distinguish those cases, as invariants usually are stated in comments or at best in assert! statements.

(Aside: It may be possible to use data flow as an approximation, but I doubt it will give us a sound measure. For one, even in our example, some of the invariants are type-based and contain no runtime data)

This leaves us with a module (or possibly crate) that we need to review and declare sound, as opposed to needing to look at the code as a whole, as is required when coding in unsafe languages, such as C. This difference cannot be overstated. The complexity of ensuring safety increases at least with the square of the number of potentially interacting parts of the code. This is also why safety guides for those languages favor strong decoupling, to limit those interactions.

Still, when writing unsafe Rust code, we can limit the workload for auditing those unsafe parts by minimizing the scope of unsafety. As the title alludes, this means building the smallest possible module that is still able to provide a safe interface over the unsafety contained within. Note that this doesn’t need to be a usable interface – a parent module can handle that with safe code.

This leaves us with two questions: First, how do we identify the minimum safe abstraction, and second, given an existing module containing unsafe code, how do we get there?

Cross-Cutting Concerns

The oughties gave us a lot of hype around “Aspect-oriented programming”. The idea behind this concept is to untangle cross-cutting concerns. The classic example was trace logging: You’d find log.trace calls at the start and end of every method, though this has nothing to do with what the method actually does.

Ripping the trace “aspect” out of the code and putting it into its own “aspect” makes the rest of the code easier to read (I’ll note that there are problems with this approach, but let’s not look into this right now).

So, let’s treat each invariant we must uphold in our unsafe code as an aspect, could we structure the code so that we get a series of layers where each one upholds one invariant, and using the (somewhat) safe abstractions underneath?

In our example, SmallArena uses a Vec as backing store. This means that we don’t have to care about out-of-bounds access or growing our storage. Our only concern is to protect the Index operations.

On the other hand, the crate has two more arena types of different sizes that use arrays of MaybeUninit as a backend. And here I haven’t yet factored out the invariant that we only can add elements up to our capacity and dealing with MaybeUninit. For now, I feel that mixing those concerns is acceptable, but as the crate grows, I might want to revisit that decision.

I also ought to think about the possibility that I may offer additional functionality. Once I implement it, I will probably put all the unsafe stuff in a submodule (that likely has a lot of pub(crate) so I can reach in) and use this from the outer crate to present an ergonomic interface.