Some Slight Improvements

04 August 2018

Hi there! I’m back! Today I want to talk about two Rust PRs I recently wrote. The PRs in question are #52942 and #52997. Both are relatively small changes to Rust’s internally used data structures that improve performance and readability. Both have some basic benchmarks (the first one already had them and I wrote them for the second one), although it’s rather hard to gauge whether they really impacted compile times (as perf.rust-lang.org puts all changes of the specific day together). But that’s not the point I want to make right now.

The Rust compiler, impressive though it is, has a lot of code, much of it not written in the best style, sometimes convoluted because of old restriction that have long since been lifted. Changing this is actually quite easy. Follow me while I trace my workflow for both PRs:

SmallVec

Reading through ljedrz’ #52859 I found the code unidiomatic, and the author’s explanation that they ran into a borrow checker problem raised the suspicion I could improve matters further.

(Aside: I’m usually on top of the PR queue because I put together a list of weekly merged PRs for This Week in Rust)

I waited for the PR to be merged, as the author thankfully had included some benchmarks I wanted to reuse. Then I looked into what actually happened: SmallVec has two modes: Array and Heap. In the first, data is stored inline, otherwise it will use a Vec. Now the speedup of this PR resulted in the fact that once the Heap state had been arrived at, the Vec would simply be extended.

However, this failed to improve matters if the SmallVec started in Array state even if reserving the requisite space as reported by size_hint().0 would have switched it to Heap state. Clearly a better path was to reserve first (which would push the SmallVec into Heap state if the current size plus the minimum size hint overflowed the array) to optimize for this common path, as in practice, SmallVecs are often extended with slices which have correct size information.

Next, I switched around the order of matched variants which appears to have assuaged the borrow checker, because I never got any error. Now the code reads better, and benchmarks were very favorable (especially in the case where the extend pushes the SmallVec into Heap state).

TinyList

That got me thinking. What other parts of the code could use improvement? I looked in the rustc_data_structures crate and found tiny_list.rs, containing a simple singly-linked list.

Both the insert and remove methods looked quite convoluted for a simple linked list. Seeing that there were no benchmarks, I added some for insert and remove (to ensure I wasn’t negatively impacting performance) and started refactoring the code.

First, let’s look at insert. The original version:

pub fn insert(&mut self, data: T) {
    let current_head = mem::replace(&mut self.head, None);

    if let Some(current_head) = current_head {
        let current_head = Box::new(current_head);
        self.head = Some(Element {
            data,
            next: Some(current_head)
        });
    } else {
        self.head = Some(Element {
            data,
            next: None,
        })
    }
}

This looks complex, and there is a lot of duplication. Yet the only difference between the Some and None branch is that if current_head was Some(head), it has to be boxed, whereas None values stayed None. Rust’s Option type already has a function for those cases, so I just took the mem::replace(..) and mapped Box::new over it to get our new next. The function becomes:

pub fn insert(&mut self, data: T) {
    self.head = Some(Element {
        data,
        next: mem::replace(&mut self.head, None).map(Box::new),
    });
}

Much easier to follow, although sadly not changing performance at all, because the compiler very likely saw through the old code already.

The remove case was a bit more involved. I must confess that I at first could not understand what the code did, but the data structure only allows for three cases which were easy enough to trace through:

The list is empty (self.head == None), so return false
The list has an element and the data is found, so replace self.head with *self.head.next (the * is to unbox it) and return true
The list has an element, which has different data: Call into the element chain recursively to remove it if found

Simply removing the existing code and coding up a match block with the three cases led to much easier code and a performance win. Apparently this time the compiler didn’t see through the second match and the extra unwrap() call. I missed the unboxing operation at first and got a sufficiently helpful type error. Most of my time was spent waiting for the benchmarks to run.

Moral of the story

Similar to how the walkaways operate in Cory Doctorow’s eponymous book, the Rust compiler gets written by many people trying different things. Errors and inefficiencies will be routed around, and while the result may look messy at first, the whole thing converges to a damn fine state, if enough people care.

So if you are somewhere on Rust’s learning curve and want to chime in, there are a lot of things to be done. Finding code that can be improved isn’t too hard (maybe we could start an initiative to help here?), nor is actually carrying out the improvement. So don’t shy away and bring those PRs! Happy Rusting!