Map of a Lifetime

06 March 2017

Today let’s look on a small thing. Let’s say (for the sake of simplicity, the actual algorithm comes up more often than you’d think) you want to do some contest, so you need to select any combination of two teams out of your list. To make it even more simple, let’s ignore teams and just focus on indices. You might write:

const TEAMS : usize = 6;

fn main() {
    for (a, b) in (0..TEAMS).flat_map(|a| (0..a).map(|b| (a, b))) {
        compete(a, b)
    }
}

However, Rust does not like this:

error: `a` does not live long enough
 --> combine.rs:4:59
  |
6 |     for (a, b) in (0..TEAMS).flat_map(|a| (0..a).map(|b| (a, b))) {
  |                                                      ---  ^    - borrowed value only lives until here
  |                                                      |    |
  |                                                      |    does not live long enough
  |                                                      capture occurs here
7 |         compete(a, b)
8 |     }
  |     - borrowed value needs to live until here

error: aborting due to previous error

So, what does this tell us? As you can imagine, I was a bit stumped. Isn’t usize Copy? Why is it borrowing at all? Taking a step back, the capture (the first span in the error message) appears to constitute a borrow, which the borrow checker cannot work out.

So what to do? We don’t need to borrow, and I just remembered that closures can move. So we change the map closure to move |b| (a, b) and borrowck is happy.

However, is this really a good way to do this? Let’s benchmark against a naive double loop (I’m going to use nightly’s test feature so I don’t need to setup a crate):

#![feature(test)]
extern crate test;
use test::{Bencher, black_box};

const TEAMS : usize = 6;

#[bench]
fn bench_flat_map(b: &mut Bencher) {
    b.iter(||
        for (a, b) in (0..TEAMS).flat_map(|a| (0..a).map(move |b| (a, b))) {
            black_box((a, b));
        }
    );
}

#[bench]
fn bench_nested_loops(b: &mut Bencher) {
    b.iter(||
        for a in 1..TEAMS {
            for b in 0..a {
                black_box((a, b));
            }
        }
    );
}

Let’s compile and run it:

$ rustc --test -C opt_level=3 -C lto -C target-cpu=core2 compete.rs
$ ./compete --bench

running 2 tests
test bench_flat_map     ... bench:          64 ns/iter (+/- 0)
test bench_nested_loops ... bench:           6 ns/iter (+/- 0)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured

Ouch. More than 10 times the execution time of a naive nested loop. Well, we all know that it pays to keep code simple. Even with a larger number of TEAMS we see the flat_map version lagging behind (though it’s catching up eventually). Looking into the assembly shows that the nested loops were fully unrolled whereas the flat_map version was left as an exercise to a MIR optimizer. Perhaps this should be seen as an LLVM bug.

On the upside, the flat_map version is easier to abstract over than the nested loops (although using a higher-order function would make perfect sense here).