Minimum Safe Abstractions
Recently, there’s been a lot of talk about unsafe in the Rust world, and how
to deal with it. Let’s recap: Rust has a subset called “safe Rust”, with a few
very neat guarantees, such as memory safety and freedom from data races. The
superset that completes the language is called “unsafe Rust”, and it still has
a number of cool safeguards, but it also has an escape hatch to allow bending a
few of them in order to let us write safe abstractions on top in much the same
language.
The keyword to open up this can of worms is unsafe, but this is not the
perimeter you need to secure. An unsafe statement means that the programmer
has taken it upon themselves to uphold the required invariants for memory
safety and will write the code in a way so that users calling into safe
functions cannot ever (whether in good or bad faith) undermine the safety.
Otherwise the code must be deemed unsound.
For a small-ish example, in compact_arena, I use the type system to encode
the invariant that indices given out by arena.add(_) are only ever usable
with the same arena. This is done by binding an invariant lifetime to each
index and the arena and requiring that same lifetime on indexing so that we can
safely use unchecked indexing.
This shows a common pattern with unsafe: We are able to relax some
requirements (such as “no unchecked indexing”) as long as certain invariants
(“no invalid index”) can be ensured. In this case, we have three moving parts:
- The mk_arenamacro ensures that the invariant lifetime given toSmallArena::newis unique (note that this is a compile-time invariant)
- The addfunction ensures that the resulting indices are valid for this arena
- The indexfunction uses those two facts to allow quick unchecked indexing
Actually, there is a fourth part: The absence of any function that invalidates an existing index. Adding such a function would make the code unsound.
Returning to the perimeter of unsafety, it is generally agreed that it is the
module boundary. Privacy and coherence ensure that we can stop users from
reaching into our module and modifying internals, so this is the only practical
definition. In practice, there may be uses of unsafe that rely only on
invariants the containing function can already ensure. We currently lack the
tools to distinguish those cases, as invariants usually are stated in comments
or at best in assert! statements.
(Aside: It may be possible to use data flow as an approximation, but I doubt it will give us a sound measure. For one, even in our example, some of the invariants are type-based and contain no runtime data)
This leaves us with a module (or possibly crate) that we need to review and declare sound, as opposed to needing to look at the code as a whole, as is required when coding in unsafe languages, such as C. This difference cannot be overstated. The complexity of ensuring safety increases at least with the square of the number of potentially interacting parts of the code. This is also why safety guides for those languages favor strong decoupling, to limit those interactions.
Still, when writing unsafe Rust code, we can limit the workload for auditing those unsafe parts by minimizing the scope of unsafety. As the title alludes, this means building the smallest possible module that is still able to provide a safe interface over the unsafety contained within. Note that this doesn’t need to be a usable interface – a parent module can handle that with safe code.
This leaves us with two questions: First, how do we identify the minimum safe abstraction, and second, given an existing module containing unsafe code, how do we get there?
Cross-Cutting Concerns
The oughties gave us a lot of hype around “Aspect-oriented programming”. The
idea behind this concept is to untangle cross-cutting concerns. The classic
example was trace logging: You’d find log.trace calls at the start and end
of every method, though this has nothing to do with what the method actually
does.
Ripping the trace “aspect” out of the code and putting it into its own “aspect” makes the rest of the code easier to read (I’ll note that there are problems with this approach, but let’s not look into this right now).
So, let’s treat each invariant we must uphold in our unsafe code as an aspect, could we structure the code so that we get a series of layers where each one upholds one invariant, and using the (somewhat) safe abstractions underneath?
In our example, SmallArena uses a Vec as backing store. This means that
we don’t have to care about out-of-bounds access or growing our storage. Our
only concern is to protect the Index operations.
On the other hand, the crate has two more arena types of different sizes that
use arrays of MaybeUninit as a backend. And here I haven’t yet factored out
the invariant that we only can add elements up to our capacity and dealing
with MaybeUninit. For now, I feel that mixing those concerns is acceptable,
but as the crate grows, I might want to revisit that decision.
I also ought to think about the possibility that I may offer additional
functionality. Once I implement it, I will probably put all the unsafe stuff in
a submodule (that likely has a lot of pub(crate) so I can reach in) and use
this from the outer crate to present an ergonomic interface.