Llogiq on stuff

Corner Cutting vs. Productivity

I recently got into a discussion with another very knowledgeable Rustacean, who (I paraphrase) claimed that Rust is about adding just enough roadblocks to keep you from cutting corners. This is a nice metaphor because it explains a lot: Rust may feel more cumbersome, because it won’t let you cut corners. On the other hand, once it compiles, many classes of errors will already have been taken care of, so your code will usually work as expected (or if you’re new to Rust, unexpectedly well).

This theme is pervasive throughout. Not only is the compiler strict, it is also often surprisingly helpful (and you can help make it even better, see issues marked with A-diagnostics!), the docs are so good (and with doctests also useful as tests) that they make you try to emulate their style (confession: I’ve never documented any code as well as my Rust code) and the Code of Conduct keeps you from cutting corners in personal interactions (sure it would be easier to be an ███████, but it will hinder more than help in the long run).

But I think this view implies that Rust is all roadblocks, which isn’t true. So my counter-argument here is that Rust is an experiment to see how productive it can make you without letting you cut any corners. I used to say “Writing Rust may be a little harder, but writing incorrect Rust is a lot harder”. But I (carelessly?) omitted the reference point. Harder than what? Today my conclusion is “harder than cutting corners in other languages”. But even that may underestimate Rust’s design.

Consider for example the ? operator, which together with the Result type creates an easy way to correctly handle all errors. Now let’s see the alternatives:

  • Go will simply return errors, and together with multiple return values this forms a poor man’s monadic error handling, unless for the fact that you can (and may out of lazyness or forgetfullness) cut corners and omit the error handling, with results ranging from benign to disastrous. The worst thing about this error handling style is that the failure mode (what happens if you forget to cross the t or dot the i) is that the code runs as if the error didn’t happen, possibly running into follow-up errors later that will be harder to debug because the original problem has been silently ignored
  • Java has unchecked exceptions, which are mostly as bad as Go’s returns but for the slightly better failure mode of an early exit until the next catch block. While unchecked exceptions were usually reserved for programmer error and should not be caught at all, this is often done regardless (and sometimes for good reason), so the failure mode very much depends on the catch blocks on the call stack. In the worst case the error will be silently discarded (though this is deemed very bad style), otherwise it will hopefully at least be logged. Java also has checked exceptions, which must be caught somewhere, and should be used for errors stemming from system or input (e.g. IOException). To some this is the worst of both worlds, requiring more ceremony with little benefit. On the other hand, it at least compels the programmer to somehow deal with the errors that may happen. The downside compared to Rust is that it is unclear where in the try-catch block the exception may happen
  • Python’s exceptions are similar to Java’s unchecked exceptions. In current versions you can only raise things deriving from BaseException, earlier versions even allowed raise "Ouch" – Ouch indeed
  • Lua allows you to call a function with an error handler, which is like a function-wide catch block
  • On the non-GC language side, C++ has exception, but they are notoriously hard to work with while avoiding memory leaks and almost-constructed objects. Many programmers will try to avoid them for this exact reason, and I always have a smidgen of doubt if my C++ code is really exception-safe.
  • C uses a mixture of return codes (which have the same downsides as with Go) and an archaic mechanism called setjmp/longjmp, which allows you to setup a handler and jump into it from arbitrary code. This can be used for basic error handling (and is for example used as the basis for Lua error handling)

In summary, the Rust way is both easier, naturally leads the programmer to do the right thing and conveys more information to boot.

A related example is Rust’s use of Option<_> to represent nullable values. Thus the type system keeps you from forgetting to deal with the None case while simultaneously giving you combinator functions (for example the powerful map_or) that make working with those types easy.

  • C, C++ and Java have null, lua has nil (which is the same thing under a different name). It has the same bad failure mode as ignoring an error. At least in Java, you will get a NullPointerException at the point of dereferencing the null (which may not be the same point as obtaining it). In C or C++ you’ll get a segmentation fault if you’re lucky, and nasal demons otherwise.
  • Java as of version 8 has an Optional<T> object, that unfortunately still can be null. There’s a “Project Valhalla” to allow value objects in Java which would fix this. Here’s to hoping we get it soon. On the other hand, a very good static nullness check for Java has been introduced with the FindBugs (now SpotBugs) static analyzer
  • Go, like lua has nil, but only for references. Since Go has value objects, those cannot be nil

Verdict: null (or its cousin nil) has the effect of making every reference value an Option. Using references for all objects (as Java does) exacerbates this problem considerably. By making the Option type explicit, Rust (and other modern languages) greatly reduces the cognitive load it entails.

Let’s take another example, splitting strings. In Rust, you get a Split object, which is an Iterator over borrowed immutable string slices. This both avoids allocation for the parts and lets you deal with the values one by one without needing a container. One can map str::to_owned to get Strings of the subslices. Easy and performant. The downside is that we need two string types, &str for borrowed string slices and String for owned strings. So what do other languages do?

  • Java’s String.split(..) returns an array, which necessitates an allocation and requires either a count of the substrings before building or some kind of reallocation. Worse, until Java 7u6, the substrings reused the internal char array of the original String, which is similar to what Rust does. Unfortunately, Java has no way of limiting the substring lifetime to the lifetime of the original string, so it had to expand the lifetime of the original string instead. Since 7u6, the substrings are instead copied from the original string, requiring a bit less than double the memory (some memory is also needed for allocating the returned array), but letting the GC reclaim the original string earlier even if the substrings are still alive
  • Python’s str.split method returns a list of substrings. So they need to allocate both the list and copies of the substrings
  • Lua’s minuscule standard library has no functions to split strings, but string.gmatch might be used to search for the inverse of the split pattern
  • C has a wildly unsafe, aptly named strtok function (“string tokenizing, remember this is a language from a time when long names were expensive). Apart from being a security nightmare it appears to work
  • C++ users, redditor CornedBee informs me have the option of either using C’s strtok, creating and reading “lines” from an istringstream (if the separator is a single character), writing their own or using some library, e.g. Boost.

Rust gives you a simple way to work with split strings without requiring allocation unless you really need it. The borrow checker ensures it is all safe, where in other languages you’d have to make copies left and right to be sure. Using the iterator’s methods is not much harder than working with the lists or arrays of substring, and lends itself to further optimizations. I’d like to note that other languages make it basically impossible to safely match Rust’s performance here. In this case, the information retained in the Split iterator allows the compiler to eke out more performance while keeping a familiar interface. Simply looping over the subslices will look mostly the same as in other languages, and you only need a modicum of additional complexity to get Strings (.map(str::to_owned)) or a collection (.collect(), though you may need to specify the type of collection via “turbofish”: .collect::<Vec<_>>()).

Speaking of loops, Rust has loop, while, while let and for loops. While loop may seem unnecessary, in fact all other loops are desugared into som loop + match combination. for loops work with the IntoIterator and Iterator traits to compile down into well-performing code.

  • C and C++ have a for (<initializer>;<test>;<increment>) … loop and a while (<test>) … loop (also a version with a test after the loop body, do {} while(<test>);). The for loop is infamously sometimes misused for strange iteration patterns which may make the code hard to understand. The block brackets {} are optional for all but the do-while loop, however this can lead people to misread code. C++11 also has a for(<binding> : <iterator>) loop similar to Rust’s
  • Java has both a C-style for and while loops, as well as a for(<binding> : <iterator>) loop (spoken for-each loop) introduced in 1.5 that works with arrays and Iterables. The latter is now the preferred form to iterate over collections of objects. Since Java 8, there are also Streams which almost work similar to Rust’s Iterators, except of course they don’t allow as much control, miss some functions and need to allocate to be safe
  • Python has a for loop that works with its own iterator protocol which is similar to Rust’s, however it uses a StopIteration exception to break the loop instead of an Option type. Python also has while loops. Since blocks are marked by indentation, it has less risk of misreading the code, however it makes refactoring more cumbersome, and you run the risk of losing track in larger swathes of code.
  • Lua also has a for loop working with an internal iteration protocol similar to Python’s, but returning a sentinel nil value to end the iteration.

Rust loops are as simple as high-level Python’s, but offer far better performance. In addition, loop { .. } arguably captures the intent of an unbounded loop better than while (true) { .. }.

My last example is destructuring pattern matching. Though it adds a lot of complexity to the language, the resulting simplicity in the code is well worth it. Take a simple enum with a match and you will both get the correct internals for each variant and be assured by the compiler that you handled every possible variant.

  • C and Java only have a switch statement that works on plain numeric values or enums (which may not contain additional data. You could use a tag enum + union idiom in C and would likely use a polymorphic type in Java if you need to store data. The former is crude and won’t give you any assurance that you used the right variant. The latter can get quite unwieldy and won’t give you the same assurances either. As for the switch statement, both the fact that it is a statement instead of an expression and that it allows fallthrough (you need to break from each case) exacerbates the ergonomics.
  • C++ has learned from this and now offers std::variant types, which are still not used widely and I hear they have their own downsides (they don’t have destructuring, for one)
  • Python and Lua simply don’t have such statements. You need a wall of ifs or a dict/table to do something like matching, and it still won’t give you safe access to the data. In Python, you could also use Java-like class polymorphism, but that doesn’t get you better ergonomics either.

Short story short, Rust’s enums + destructuring matches offer great productivity while doing away with footguns other languages may allow or even prescribe. Here the language pays with a bunch of complexity to make your code easier to both write and read (at least to the initiated).

I hope those examples have convinced you that disallowing cut corners doesn’t need to harm productivity even in the short run. In the long run, being diligent obviously wins.