Llogiq on stuff

Easy Mode Rust

This post is based on my RustNationUK ‘24 talk with the same title. The talk video is on youtube, the slides are served from here.

Also, here’s the lyrics of the song I introduced the talk with (sung to the tune of Bob Dylan’s “The times, they are a-changin’”):

Come gather Rustaceans wherever you roam
and admit that our numbers have steadily grown.
The community’s awesomeness ain’t set in stone,
so if that to you is worth saving
then you better start teamin’ up instead of toilin’ alone
for the times, they are a-changin’.

Come bloggers and writers who tutorize with your pen
and teach those new folks, the chance won’t come again!
Where there once was one newbie, there soon will be ten
and your knowledge is what they are cravin’.
Know that what you share with them is what you will gain
for the times, they are a-changin’.

Researchers and coders, please heed the call,
Without your efforts Rust would be nothin’ at all
and unsafety would rise where it now meets its fall.
May C++ proponents be ravin’.
What divides them from us is but a rustup install
for the times, they are a-changin’.

Fellow moderators throughout the land,
don’t you dare censor what is not meant to offend
otherwise far too soon helpful people be banned
and what’s left will be angry folks ragin’.
Our first order of business is to help understand
that the times, they are a-changin’.

The line it is drawn, the type it is cast
What debug runs slow, release will run fast
as the present now will later be past
and our values be rapidly fadin’
unless we find new people who can make them last
for the times, they are a-changin’.

Rust has an only somewhat deserved reputation for being hard to learn. But that is mostly an unavoidable consequence of being a systems language that has to supply full control over the myriad of specifics of your code and runtime. But I’d argue that our method of teaching Rust is rather more at fault for this reputation. So as an antidote to this “the right way to do it” thinking, I offer this set of ideas on how to learn as little Rust as possible to become productive in Rust, so you can start and have success right away and learn the harder parts later when you’re comfortable with the basics.

In the talk I started with the “ground rules” for the exercise: I wanted to identify a small subset to learn that will allow people to successfully write Rust programs to solve the problems in front of them without being overwhelmed by all kinds of new concepts. I am happy to forgo on performance, brevity of the code or idiomatic code. In fact, some of the suggestions fly in the face of conventional guidelines on how good Rust code should look like. One of the questions after the talk was how to deal with new contributors or colleagues pushing “substandard” code to a project, and here my suggestion is to just merge it and clean it up after the fact. New users will feel unsure about their abilities, and nitpicking on details will put them off where we want to encourage them in their growth and learning, at least in the beginning.

Of course, the flipside of this is that I don’t suggest that every Rustacean learn only this subset and forever avoid all else. The idea here is to make you productive and successful quickly, and you can then build on that. A suggestion that also came up after my talk was to create a poster with a “research tree” (that is sometimes used in strategy games like e.g. Civilization to give people a path to progress without making it too linear). This is still on my list and I’ll open a repo for that soon, in the hope of finding people who’ll help me.

So without further ado, here are the things we want to avoid learning, and how to do that:

Syntax

Rust is not a small language. When starting out, for flow control it’s best to stick to basic things like if, for and while. If you need to distinguish e.g. enum variants, you can also use match, but keep it simple: Only match one thing, and avoid more complex things like guard clauses:

// Don't nest patterns in match arms
match err_opt_val {
  Some(Err(e)) => panic!("{e}"),
  _ => (),
}

// instead, nest `match` expressions
match err_opt_val {
  Some(err_val) => match err_val {
    Err(e) => panic!("{e}"),
    _ => (),
  },
  _ => (),
}

// Don't use guards
match w {
  Some(x) if x > 3 => { one(x) },
  Some(x) => { other(x) },
  None => (),
}

// instead, nest with `if`
// (that might require you to copy code)
match w {
  Some(x) => {
    if x > 3 {
      one(x)
    } else {
      other(x)
    }
  },
  None => (),
}

Avoid other constructs for now (such as if let or let-else). While they might make the code more readable, you can learn them later and have your IDE refactor your code quickly as you become privy to how they work. Within loops, avoid break and continue, especially with values. Rather introduce a new function that returns the value from within a loop.

As discussed in the introduction, this will take more code and thus exacerbate both brevity and readability, but the individual moving parts are far simpler.

The SBorrow checker

In my talk, I used a classic example that will often come up during search algorithms: Extending a collection of items with filtered and modified versions of the prior items.

for item in items.iter() {
  if predicate(item) {
    items.push(modify(item));
  }
}

The code here pretty much mirrors what you’d do in e.g. Python. It’s simple to read and understand, and there aren’t any needless moving parts. Unfortunately, it is also wrong, and the compiler won’t hesitate to point that out:

  Compiling unfortunate v0.0.1
error[E0502]: cannot borrow `items` as mutable because it is also borrowed as immutable
  --> src/main.rs:16:13
   |
13 |     for item in items.iter() {
   |                 ------------
   |                 |
   |                 immutable borrow occurs here
   |                 immutable borrow later used here
...
16 |             items.push(new_item);
   |             ^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here

For more information about this error, try `rustc --explain E0502`.

Luckily, in almost all of the cases, we can split the immutable from the mutable borrows. In this particular case, we can simply iterate over a clone of our item list:

//               vvvvvvvv
for item in items.clone().iter() {
  if predicate(item) {
    items.push(modify(item));
  }
}

I used to be very wary of cloning in the past, considering it an antipattern, but unless that code is on the hot path, it literally won’t show up on your application’s profile. so going to the effort of avoiding that clone is premature optimization. However, if you have measured and note that the clone is in fact showing up either on your memory or CPU profile, you can switch to indexing instead:

for i in 0..items.len() {
  if predicate(&items[i]) {
    let new_item = modify(&items[i]);
    items.push(new_item);
  }
}

Note that this approach is more brittle than the clone based one: While in this case the loop itself only uses integers and thus doesn’t borrow anything, we might still inadvertently introduce overlapping borrows into the loop body. For example, if we replaced the if with if let, items would be borrowed from for the duration of the then-clause, thus causing the exact error we were trying to avoid in the first place. Also note that we put modify(..) into a local to avoid having it within the push, which might also possibly trip up the borrow checker.

Again, we’re not generally aiming for performance, so I would prefer the clone-based variant as much as possible.

Macros

Macros come up early in Rust. Literally the first Rust program everyone compiles (or even writes) is:

fn main() {
  println!("Hello, World!");
}

You’ll have a hard time writing Rust without calling macros, so I would suggest you treat them like you’d treat functions, with the caveat that they can have a rather variable syntax, but they’ll usually document that. So as long as you roughly know how to call the macro and what it does, feel free to call them as you like.

Writing macros is something we’ll want to avoid. Rust has a number of macro types (declarative macros, declarative macros 2.0, derive macros, annotation macros and procedural bang-macros), but we’re not going to look into writing any of those. All of those macro variants solve a single problem: Code duplication.

Now the obvious simple solution to avoiding macros is: Duplicate your code.

While that sounds very simple, in fact the advice can be split into a hierarchy of solutions that depend on the problem at hand:

  1. up to 5 times, less than 10 lines of code, not expected to change: In this case I’d just copy & paste the code and edit it to fit your requirements. Of course, you still have the risk of introducing errors in one of the instances of the copied code, but with the resulting code being reasonably compact, you’ll have a good chance to catch those quickly.
  2. more than that, still not expected to change within a certain time: Who is better than creating multiple almost-same instances of the same thing than you? Your computer of course! So write some Rust code that builds Rust code by building strings (using format!(..) or println!(..)), call it once and copy the output into your code. Voilà!
  3. expected to be up to date with the rest of the code? In that case, put your code generation into a unit test that reads and splits out the current version of the code, generates the possibly updated version, compares both and if they differ writes the updated version of the code, then panic with a message to tell whoever ran the test they need to commit the changes. It is helpful to add start and end marker comments to the generated code to make splitting it out easier and to document the fact that code is generated.

In code, instead of doing:

macro_rules! make_foo {
  { $($a:ident),* } => { $(let $a = { "foo" };)* };
}
make_foo!(a, b);

Either 1. copy & paste instead:

let a = { "foo" };
let b = { "foo" };

Or 2. write code to generate code:

fn format_code(names: &[&str]) -> String {
  let mut result = String::new();
  for name in names {
    result += format!("\nlet {name} = \"foo\";");
  }
  result
}

Additionally, 3. use a test to keep code updated:

#[test]
fn update_code() {
  let (prefix, actual, suffix) = read_code();
  let expected = format_code(&["a", "b"]);
  if expected == actual { return; }
  write_code(prefix, expected, suffix);
  panic!("updated generated code, please commit");
}

Alexey Kladov explains the latter technique better than I could in his blog post about Self-modifying code.

Generics

Generics lets us re-use code in various situations by being able to swap out types. However, they can also be a great source of complexity, so we’ll of course want to avoid them. As Go before version 1.2 has shown, you can get quite far without them, so unless it’s for the element type of collections, we’ll want to avoid using them.

So instead of writing

struct Foo<A, B> { .. }
fn foo<A, B>(a: A, b: B) { .. }

we’d monomorphize by hand (that is build a copy of the code for each set of concrete types we need), so for each A/B combination, we’d write:

// given `struct X; struct Y`
struct FooXY { .. }
fn foo_x_y(a: X, b: Y) { .. }

Of course, you might end up with a lot of copies, so use the code generation from above to deal with that.

Lifetimes

Lifetime annotations are those arcane tick+letter things that sometimes even stump intermediate Rust programmers, so you won’t be surprised to find them on this list. They look like this:

struct Borrowed<'a>(&'a u32);
fn borrowing<'a, 'b>(a: &'a str, b: &'b str) -> &'a str { .. }

Of course, we don’t want to burden ourselves with those sigil-laden monstrosities for now. To get rid of those, we have to avoid borrowing in function signatures. So instead of taking a reference, take an owned instance. And yes, this will incur more cloning yet. If you need to share an object (e.g. because you want to mutate it), wrap it in an Arc:

struct Arced(Arc<u32>);
fn cloned(a: String, b: String) -> String { .. }
fn arced(a: Arc<String>, b: Arc<String>) -> String { .. }

Arc is a smart pointer that lets you .clone() without cloning the wrapped value. Both Arcs will lead to the exact same value, and if you mutate one, the other will also have changed.

Traits

You can get a lot done without ever implementing a trait in Rust. However, there are some traits (especially in the standard library, but also in trait-heavy crates like serde) that you might need to get some stuff done. In many cases, you can use a #[derive(..)]-annotation, such as

#[derive(Copy, Clone, Default, Eq, PartialEq, Hash)]
struct MyVeryBadExampleIAmSoSorry {
  size: usize,
  makes_sense: bool,
}

In some cases, Rust lore would tell you to use trait based dispatch, but in most of those cases, an enum and a match or even a bag of if-clauses will do the trick. Remember, we’re not attempting to have our code win a beauty contest, just get the job done.

Finally, if you use a framework that requires you to manually implement a trait, write impl WhateverTrait for SomeType and use the insert missing members code action from your IDE if available.

Modules and Imports

This is something we cannot completely avoid. If we don’t use any imports, our code can only use what’s defined in the standard library prelude, and we won’t get very far with that. Also even if we did, we’d end up with a 10+k lines of code file, and no one wants to navigate that. On the other hand, when using modules, we should strive to not go overboard, lest we find ourselves in a maze of twisty little mod.rs files, all different (pardon the text adventure reference).

So we obviously need to import stuff we use, but how do we introduce mods? The key to keeping this simple is the observation that mods conflate code organization with the hierarchy of paths in our crate. So if I have a mod foo containing a bar, people using my code will have to either import or directly specify foo::bar. But there are two recipes we can follow to untangle those. Given an example lib.rs where our code has two functions:

pub fn a() {}
pub fn b() {}

Now in practice, those functions likely won’t be empty, and in most cases we’ll have more than two of them, but you want to read this blog post, not wade through screens of code, so let’s look at the first recipe we will use to move b to a new b.rs file without changing the path where b is visible from the outside. The recipe has three steps:

  1. Declare the b module in lib.rs and pub use b from it:
mod b;
pub use b::b;

pub fn a() {}
pub fn b() {}
  1. Create b.rs, non-publicly importing everything from above, so that fn b() won’t fail to compile because of missing paths from lib.rs:
use super::*;
  1. Move fn b() into b.rs:
use super::*;

// moved from `lib.rs`:
pub fn b() {}

Congratulations, you just split your code without anyone using it being the wiser.

The second recipe is to move the b path into the b module. In this case, we have to make the b module publicly visible and then remove the pub use from our lib.rs:

pub mod b; // added `pub` here
//pub use b::b; <-- no longer needed

pub fn a() {}

Voilà, your users won’t be able to call b() directly anymore unless they import it from b::b. Now modules are still a messy beast, but at least there are easy steps to take to deal with them.

Async

Rust async is arcane, powerful and still has a good number of rough edges. So unless you’re writing a web service that needs to serve more than fifty thousand concurrent users on a single machine, try to avoid it (remember, we’re not after performance here). However, some libraries you may want to use will require async code. In this case, pick an async runtime (most libraries will work with tokio, so that seems a safe choice) and

  • write your functions as you would write a normal function, prepending async before the fn
  • add .await after every function call, and then remove it again wherever the compiler complains
  • avoid potentially locking mechanisms such as Mutex, RwLock, and channels. If you absolutely must use one of them, your async runtime will provide replacements that won’t deadlock on you

You might still run into weird errors. Don’t say I didn’t warn you.

Data Structures

If I had a penny for each internet troll asking me to write a doubly-linked list in safe Rust (hint: I can, but I don’t need to, there’s one in the standard library), I’d be a very well-off man. So you won’t be surprised to read me suggesting you avoid writing your own data structures. In fact I’ll go one step further and allow you to put off learning what data structures are already provided, because you can get by with only two in the majority of cases: Sequence-like and Lookup-like.

Sequences

Whether you call them lists, or arrays, or sequences is of little import. This type of structure is usually there to be filled and later iterated over. For example, if we want to have three items:

  1. start
  2. more
  3. end

we can put them in a Vec:

vec!["start", "more", "end"]

Use Vecs everywhere you want to sequentially iterate items.

Lookups

Those may be called maps, or dictionaries, and allow you to associate a key with a value, to later retrieve the value given the key.

KEY VALUE
recursion please look at “recursion”

we will use HashMaps for this case:

HashMap::from([
  ("recursion", "please look at recursion"),
])

While hashmaps can be iterated, the main use case is to get a value given a key.

You will be surprised how many programs you can create by just sticking to those two structures.

Custom Iterators

Rust iterators are also very powerful, and being able to call their combinator functions (like filter, map etc.) you can create new iterators. There are only two points to be mindful of: First, the combinator functions won’t iterate anything, they will just wrap the given iterator into a new iterator type that will modify the iterator’s behavior, and second, the resulting types are usually very hard if not impossible to write out. So you should try avoiding returning such a custom iterator from a function, instead collecting into a Vec or HashMap (see above). If however your iterator is infinite (yes, that can happen), you obviously cannot collect it. In those rare cases, here’s the magic trick to make it work:

fn return_custom_iterator() -> Box<dyn Iterator<Item = MyItemType>> {
  // let's say we filter and map an effectively infinite range of integers
  let iter = (0_usize..).filter(predicate).map(modify);
  Box::new(iter) as Box<dyn Iterator<Item = MyItemType>>
}

So now you know what things you can put off learning while you’re being productive in Rust. Have fun!