Some Slight Improvements
Hi there! I’m back! Today I want to talk about two Rust PRs I recently wrote. The PRs in question are #52942 and #52997. Both are relatively small changes to Rust’s internally used data structures that improve performance and readability. Both have some basic benchmarks (the first one already had them and I wrote them for the second one), although it’s rather hard to gauge whether they really impacted compile times (as perf.rust-lang.org puts all changes of the specific day together). But that’s not the point I want to make right now.
The Rust compiler, impressive though it is, has a lot of code, much of it not written in the best style, sometimes convoluted because of old restriction that have long since been lifted. Changing this is actually quite easy. Follow me while I trace my workflow for both PRs:
SmallVec
Reading through ljedrz’ #52859 I found the code unidiomatic, and the author’s explanation that they ran into a borrow checker problem raised the suspicion I could improve matters further.
(Aside: I’m usually on top of the PR queue because I put together a list of weekly merged PRs for This Week in Rust)
I waited for the PR to be merged, as the author thankfully had included some
benchmarks I wanted to reuse. Then I looked into what actually happened:
SmallVec
has two modes: Array and Heap. In the first, data is stored inline,
otherwise it will use a Vec
. Now the speedup of this PR resulted in the fact
that once the Heap state had been arrived at, the Vec would simply be extended.
However, this failed to improve matters if the SmallVec
started in Array
state even if reserving the requisite space as reported by size_hint().0
would have switched it to Heap state. Clearly a better path was to reserve
first (which would push the SmallVec
into Heap state if the current size plus
the minimum size hint overflowed the array) to optimize for this common path,
as in practice, SmallVec
s are often extended with slices which have correct
size information.
Next, I switched around the order of matched variants which appears to have
assuaged the borrow checker, because I never got any error. Now the code reads
better, and benchmarks were very favorable (especially in the case where the
extend
pushes the SmallVec
into Heap state).
TinyList
That got me thinking. What other parts of the code could use improvement? I
looked in the rustc_data_structures
crate and found tiny_list.rs
,
containing a simple singly-linked list.
Both the insert
and remove
methods looked quite convoluted for a simple
linked list. Seeing that there were no benchmarks, I added some for insert and
remove (to ensure I wasn’t negatively impacting performance) and started
refactoring the code.
First, let’s look at insert
. The original version:
pub fn insert(&mut self, data: T) {
let current_head = mem::replace(&mut self.head, None);
if let Some(current_head) = current_head {
let current_head = Box::new(current_head);
self.head = Some(Element {
data,
next: Some(current_head)
});
} else {
self.head = Some(Element {
data,
next: None,
})
}
}
This looks complex, and there is a lot of duplication. Yet the only difference
between the Some
and None
branch is that if current_head
was
Some(head)
, it has to be boxed, whereas None
values stayed None
. Rust’s
Option
type already has a function for those cases, so I just took the
mem::replace(..)
and map
ped Box::new
over it to get our new next
. The
function becomes:
pub fn insert(&mut self, data: T) {
self.head = Some(Element {
data,
next: mem::replace(&mut self.head, None).map(Box::new),
});
}
Much easier to follow, although sadly not changing performance at all, because the compiler very likely saw through the old code already.
The remove
case was a bit more involved. I must confess that I at first could
not understand what the code did, but the data structure only allows for three
cases which were easy enough to trace through:
- The list is empty (
self.head == None
), soreturn false
- The list has an element and the data is found, so replace
self.head
with*self.head.next
(the*
is to unbox it) andreturn true
- The list has an element, which has different data: Call into the element chain recursively to remove it if found
Simply removing the existing code and coding up a match
block with the three
cases led to much easier code and a performance win. Apparently this time the
compiler didn’t see through the second match
and the extra unwrap()
call. I
missed the unboxing operation at first and got a sufficiently helpful type
error. Most of my time was spent waiting for the benchmarks to run.
Moral of the story
Similar to how the walkaways operate in Cory Doctorow’s eponymous book, the Rust compiler gets written by many people trying different things. Errors and inefficiencies will be routed around, and while the result may look messy at first, the whole thing converges to a damn fine state, if enough people care.
So if you are somewhere on Rust’s learning curve and want to chime in, there are a lot of things to be done. Finding code that can be improved isn’t too hard (maybe we could start an initiative to help here?), nor is actually carrying out the improvement. So don’t shy away and bring those PRs! Happy Rusting!