Mapping over Arrays

28 April 2016

Given I’m doing quite a lot of work with Rust, I could almost have forgotten that I’m still learning the language – and as sharing what I’ve learned is my prime reason for writing this blog, today I’m going to delve into two corners of Rust I had not touched so far.

So let’s set the stage: Just for fun, to see how it works and to maybe find something useful, I tried to remove most allocations I could find from some benchmarks. This was mostly removing vec! here and there, but in some places I could not get rid of them, because, as you know if you’ve worked with arrays in Rust, those things need a correct value at every index on construction.

Either using [Option<T>] instead of Vec<T> or writing some Default just to overwrite it later both look like suboptimal solutions. What I needed in most cases was a way to map(_) a function over an array, creating a new one in the process. Something like:

// T is [Y; N] for some [X; N] we impl this trait for
pub trait ArrayMap<X, Y, T> {
    fn map<F: Fn(&X) -> Y>(&self, f: F) -> T;
}

With the right implementations, this could be used like:

let squares = [1, 2, 3].map(|x| x * x);

This is easy to do for one array size, let’s take the fairly atypical zero sized array case first:

impl<U, V> ArrayMap<U, V, [V; 0]> for [U; 0] {
    fn map<F: Fn(&U) -> V>(&self, _: F) -> [V; 0] { [] }
}

For the larger sizes (up to 32, as is custom in Rust), I could have pulled up my sleeves and write the impls by hand, but that would not have been very rustic. If one looks into the standard library, one finds a lot of macros to set up most trait implementations. As the saying goes, when in Rome, do like the Romans do, so let’s write some macros.

Note that this is my first foray into Rust macros, and the code may be ugly and needlessly complicated. Please bear with me.

First, we’ll need the impl with the fn, let’s keep the method unimplemented for now.

macro_rules! map_impl {
    {$n:expr, $($ns:expr),*} => {
        // this works like a template over $n
        impl<U, V> ArrayMap<U, V, [V; $n]> for [U; $n] {
            fn map<F: Fn(&U) _> V>(&self, _: F) -> [V; $n] {
                unimplemented!() // for now
            }
        }
        // take one down, pass it around, ...
        map_impl!($($ns),*);
    };
    {, } => {};
}

// thirty-two impls of ArrayMap on the wall...
map_impl!(32, 31, 30, 29, .., 0);

What’s going on here? Within macro_rules!, we get a number of matches with corresponding expansions We call a match => expansion; a clause. The semicolon at the end of a clause is required. Within an expansion, we can write valid Rust code, including calls to arbitrary macros – including the macro we’re currently writing.

It took me a bit to learn the macro syntax, but the book is really helpful. I also perused a Stack Overflow answer from Daniel Keep which I’ll just link here for your edification. The point is that the match follows some simple rules, and we can embed stuff from the match in the expression.

We take a list of comma separated expressions (spaces don’t count), and take the first and the rest separately; the $(..),* in the match means zero or more comma separated things. We can expand that with much the same syntax (just omit the “type”). The macro calls itself until it has exhausted the list of array sizes.

Next, we need to fill the unimplemented method. In our case, we want to insert [f(&self[$n-1]), .., f(&self[0])] by using a macro invocation. But we cannot do so directly, because we would break macro hygiene – so we need to change our function a bit:

fn map<F: Fn(&U) _> V>(&self, f: F) -> [V; $n] {
    map_inner!(f, self, $($ns),*)
}

This way, we give f and self to the macro keeping hygiene intact. Note that we’ll have to match self as expr, not ident, because self is a keyword.

With map_inner!, we need to keep in mind that macros may not ever output token trees that by themselves would not be valid Rust code. So we cannot just

macro_rules! map_inner {
    {$f:ident, $s:ident, $n:expr, $($ns:expr),*} => {
        map_inner!($f, $s, $($ns)*), $f($s[$n])
    };
    {$f:ident, $:ident, $n:expr} => { $f($s[$n]) }
}

and surround by [ brackets ], because .., f(..) is not valid Rust code for any ..s. The solution is to build the complete token tree during recursive calls and only output the full thing when finished:

macro_rules! map_inner {
    {$f:ident, $s:expr, []; $n:expr, $($ns:expr),*,} => {
        map_inner!($f, $s, [$f(&$s[$n])]; $($ns),*,)
    };
    {$f:ident, $s:expr, [$($t:tt)*]; $n:expr, $($ns:expr),*,} => {
        map_inner!($f, $s, [$f(&$s[$n]), $($t)*]; $($ns),*,)
    };
    {$f:ident, $s:expr, [$($t:tt)*]; $n:expr, } => {
        map_inner!($f, $s, [$f(&$s[$n]), $($t)*])
    };
    {$f:ident, $s:expr, $t:expr} => { $t };
}

Note: This may be a bit more complex than needed and was the result of some trial & error as well as reading the aforementioned sources. If you have a more elegant solution, you are hereby cordially invited to drop me a PR.

Now with all said & done, we can use this to map functions over (small) arrays. And learned something about macros. Done and done!

…

Am I? There’s a small voice in the back of my head whispering something to me. Something scary from the far away land of Typeshenaniganistan (I just made that one up): generic-array.

We should be able to map generic arrays, right? There’s a simple, if unsafe way to do this: Set up the array with mem::uninitialized(), then iterate through the source and write the mapped value. Finally we have an array with the desired values.

However, there’s a catch (and that means it’s not possible to do that in the general case): Our function f(_) could panic!(..)! Don’t laugh – it’s more common than you’d think. The problem is that by unwinding, we could end up running our caller’s code (which could have set up our code in a thread) that could then observe the uninitialized memory, which is undefined behavior. Nasal demons galore!

So how can we become panic safe? With Rust as of 1.8.0 (the current stable), we have two options:

Don’t use mem::uninitialized(), use some kind of Default::default(), but that requires that our component type be Default and Clone, and writes every value twice. Also what happens if the .clone() call panics (We can do the default() call before setting up the array)? Well, it would be workable for Copy types (because Copy::clone() never fails), but otherwise this is replacing one evil with another.
Start another thread, set up the array there and catch the panic on join. This is a possibility, but very heavy-handed. Remember, we wanted to be resource-friendly. This breaks a fly on the wheel.

So if we ignore panic safety, we can absolutely do this. Otherwise we are out of luck. On the other hand, with a current beta or nightly we are lucky, because as of 1.9.0, catch_unwind(_) is stabilized, which allows us to catch the panic and clean up afterwards. This should have better performance because the good, panic-free path is uncluttered, while the panicky path can be simply mem::forgetting the array (which is compiled away to nothing) and resuming the unwinding.

This is called leak amplification in Rust lingo, and is the preferred method of ensuring memory safety in the face of ~~Cthulhu~~ uncertainty regarding termination of a function.

So as of May 26, 2016, we can have memory safe GenericArray<_, _>::map(..), and it’s not even that complicated. The code is in generic-array PR #12.

Now we’re done, right? Right?

…

Well, now we have two competing techniques – one looping and one straight-line solution to the same problem. A prime target for a Benchmark, which should be made for all array sizes from zero to 32 elements. We’ll also measure collecting the results in a Vec, as a baseline. However, since the benchmarks should be made for all sizes between 0 and 32, there’s a lot of repetition. So I’m going to use another macro (Warning: The following only works with a nightly Rust):

#![feature(test)]

extern crate generic_array;
extern crate arraymap;
extern crate typenum;
extern crate test;

macro_rules! bench_map {
    {$u:ident, $n:expr, $a:ident, $g:ident, $h:ident} => {
        #[bench]
        fn $a(b: &mut Bencher) {
            let arr = [0u32; $n];
            b.iter(|| {
                black_box(arr).map(|x| x + 1)
            });
        }

        #[bench]
        fn $g(b: &mut Bencher) {
            let arr : GenericArray<u32, $u> = GenericArray::new();
            b.iter(|| {
                black_box(arr).map(|x| x + 1)
            });
        }

        #[bench]
        fn $h(b: &mut Bencher) {
            let arr = [0u32; $n];
            b.iter(|| {
                black_box(arr).iter().map(|x| x + 1).collect::<Vec<_>>()
            });
        }
    };
}

#[cfg(test)]
mod tests {
    use test::{black_box, Bencher};
    use generic_array::*;
    use arraymap::ArrayMap;
    use typenum::consts::*;

    bench_map!( U0,  0, array_00, gena_u00, vec_00);
    ..
    bench_map!(U32, 32, array_32, gena_u32, vec_32);
}

The results: our loop is apparently slower; also we’re paying for the unwind capture with some nanoseconds.

running 99 tests
test tests::array_00 ... bench:           0 ns/iter (+/- 0)
test tests::array_01 ... bench:           0 ns/iter (+/- 0)
test tests::array_02 ... bench:           0 ns/iter (+/- 0)
test tests::array_03 ... bench:           1 ns/iter (+/- 0)
test tests::array_04 ... bench:           0 ns/iter (+/- 0)
test tests::array_05 ... bench:           1 ns/iter (+/- 0)
test tests::array_06 ... bench:           1 ns/iter (+/- 0)
test tests::array_07 ... bench:           2 ns/iter (+/- 0)
test tests::array_08 ... bench:           1 ns/iter (+/- 0)
test tests::array_09 ... bench:           1 ns/iter (+/- 0)
test tests::array_10 ... bench:           2 ns/iter (+/- 0)
test tests::array_11 ... bench:           3 ns/iter (+/- 0)
test tests::array_12 ... bench:           1 ns/iter (+/- 0)
test tests::array_13 ... bench:           2 ns/iter (+/- 0)
test tests::array_14 ... bench:           2 ns/iter (+/- 0)
test tests::array_15 ... bench:           3 ns/iter (+/- 1)
test tests::array_16 ... bench:           2 ns/iter (+/- 1)
test tests::array_17 ... bench:           3 ns/iter (+/- 0)
test tests::array_18 ... bench:           3 ns/iter (+/- 0)
test tests::array_19 ... bench:           4 ns/iter (+/- 1)
test tests::array_20 ... bench:           3 ns/iter (+/- 0)
test tests::array_21 ... bench:           3 ns/iter (+/- 1)
test tests::array_22 ... bench:           4 ns/iter (+/- 0)
test tests::array_23 ... bench:           5 ns/iter (+/- 1)
test tests::array_24 ... bench:           4 ns/iter (+/- 1)
test tests::array_25 ... bench:           4 ns/iter (+/- 0)
test tests::array_26 ... bench:           5 ns/iter (+/- 1)
test tests::array_27 ... bench:           5 ns/iter (+/- 1)
test tests::array_28 ... bench:           4 ns/iter (+/- 1)
test tests::array_29 ... bench:           5 ns/iter (+/- 0)
test tests::array_30 ... bench:           5 ns/iter (+/- 0)
test tests::array_31 ... bench:           6 ns/iter (+/- 0)
test tests::array_32 ... bench:           5 ns/iter (+/- 0)
test tests::gena_u00 ... bench:           8 ns/iter (+/- 0)
test tests::gena_u01 ... bench:           9 ns/iter (+/- 0)
test tests::gena_u02 ... bench:          13 ns/iter (+/- 1)
test tests::gena_u03 ... bench:          13 ns/iter (+/- 1)
test tests::gena_u04 ... bench:          14 ns/iter (+/- 10)
test tests::gena_u05 ... bench:          14 ns/iter (+/- 0)
test tests::gena_u06 ... bench:          15 ns/iter (+/- 1)
test tests::gena_u07 ... bench:          21 ns/iter (+/- 0)
test tests::gena_u08 ... bench:          16 ns/iter (+/- 0)
test tests::gena_u09 ... bench:          17 ns/iter (+/- 0)
test tests::gena_u10 ... bench:          17 ns/iter (+/- 0)
test tests::gena_u11 ... bench:          24 ns/iter (+/- 0)
test tests::gena_u12 ... bench:          18 ns/iter (+/- 0)
test tests::gena_u13 ... bench:          20 ns/iter (+/- 0)
test tests::gena_u14 ... bench:          21 ns/iter (+/- 0)
test tests::gena_u15 ... bench:          28 ns/iter (+/- 0)
test tests::gena_u16 ... bench:          22 ns/iter (+/- 0)
test tests::gena_u17 ... bench:          24 ns/iter (+/- 0)
test tests::gena_u18 ... bench:          25 ns/iter (+/- 0)
test tests::gena_u19 ... bench:          31 ns/iter (+/- 0)
test tests::gena_u20 ... bench:          26 ns/iter (+/- 0)
test tests::gena_u21 ... bench:          28 ns/iter (+/- 0)
test tests::gena_u22 ... bench:          29 ns/iter (+/- 0)
test tests::gena_u23 ... bench:          34 ns/iter (+/- 1)
test tests::gena_u24 ... bench:          30 ns/iter (+/- 0)
test tests::gena_u25 ... bench:          32 ns/iter (+/- 1)
test tests::gena_u26 ... bench:          33 ns/iter (+/- 0)
test tests::gena_u27 ... bench:          37 ns/iter (+/- 1)
test tests::gena_u28 ... bench:          34 ns/iter (+/- 0)
test tests::gena_u29 ... bench:          36 ns/iter (+/- 0)
test tests::gena_u30 ... bench:          37 ns/iter (+/- 0)
test tests::gena_u31 ... bench:          41 ns/iter (+/- 1)
test tests::gena_u32 ... bench:          44 ns/iter (+/- 1)
test tests::vec_00   ... bench:           0 ns/iter (+/- 0)
test tests::vec_01   ... bench:          18 ns/iter (+/- 0)
test tests::vec_02   ... bench:          31 ns/iter (+/- 0)
test tests::vec_03   ... bench:          30 ns/iter (+/- 1)
test tests::vec_04   ... bench:          30 ns/iter (+/- 0)
test tests::vec_05   ... bench:          30 ns/iter (+/- 0)
test tests::vec_06   ... bench:          31 ns/iter (+/- 1)
test tests::vec_07   ... bench:          35 ns/iter (+/- 0)
test tests::vec_08   ... bench:          32 ns/iter (+/- 0)
test tests::vec_09   ... bench:          33 ns/iter (+/- 0)
test tests::vec_10   ... bench:          31 ns/iter (+/- 1)
test tests::vec_11   ... bench:          36 ns/iter (+/- 1)
test tests::vec_12   ... bench:          32 ns/iter (+/- 2)
test tests::vec_13   ... bench:          33 ns/iter (+/- 1)
test tests::vec_14   ... bench:          33 ns/iter (+/- 4)
test tests::vec_15   ... bench:          37 ns/iter (+/- 0)
test tests::vec_16   ... bench:          34 ns/iter (+/- 0)
test tests::vec_17   ... bench:          36 ns/iter (+/- 1)
test tests::vec_18   ... bench:          37 ns/iter (+/- 1)
test tests::vec_19   ... bench:          38 ns/iter (+/- 1)
test tests::vec_20   ... bench:          39 ns/iter (+/- 1)
test tests::vec_21   ... bench:          41 ns/iter (+/- 0)
test tests::vec_22   ... bench:          46 ns/iter (+/- 1)
test tests::vec_23   ... bench:          47 ns/iter (+/- 0)
test tests::vec_24   ... bench:          41 ns/iter (+/- 1)
test tests::vec_25   ... bench:          50 ns/iter (+/- 0)
test tests::vec_26   ... bench:          49 ns/iter (+/- 1)
test tests::vec_27   ... bench:          50 ns/iter (+/- 2)
test tests::vec_28   ... bench:          53 ns/iter (+/- 1)
test tests::vec_29   ... bench:          52 ns/iter (+/- 1)
test tests::vec_30   ... bench:          54 ns/iter (+/- 1)
test tests::vec_31   ... bench:          56 ns/iter (+/- 9)
test tests::vec_32   ... bench:          56 ns/iter (+/- 1)

Ok now, let’s pack up and call it a day…

…

No wait! Actually, this is rather underwhelming. Only 14ns faster than a collect::<Vec<_>>()? And we cannot even do this on stable. Can’t we do better? Since I wasn’t very knowledgeable around this part of the compiler, I asked on StackOverflow if we can somehow pre-leak the array and un-leak it after filling it up. And within a few hours, I had an answer (the Rust community is awesome as always)!

Using the nodrop crate by Ulrik Sverdrup, it’s rather easy to leak a value until it is ready. The way to leak it is NoDrop::new(_) and .into_inner() un-leaks it. The updated code is in the PR, and the benchmark (just for genericarray) now gets the following results:

test tests::gena_u00 ... bench:           0 ns/iter (+/- 0)
test tests::gena_u01 ... bench:           0 ns/iter (+/- 0)
test tests::gena_u02 ... bench:           0 ns/iter (+/- 0)
test tests::gena_u03 ... bench:           1 ns/iter (+/- 0)
test tests::gena_u04 ... bench:           1 ns/iter (+/- 0)
test tests::gena_u05 ... bench:           1 ns/iter (+/- 1)
test tests::gena_u06 ... bench:           2 ns/iter (+/- 0)
test tests::gena_u07 ... bench:           2 ns/iter (+/- 0)
test tests::gena_u08 ... bench:           3 ns/iter (+/- 0)
test tests::gena_u09 ... bench:           3 ns/iter (+/- 0)
test tests::gena_u10 ... bench:           4 ns/iter (+/- 1)
test tests::gena_u11 ... bench:           7 ns/iter (+/- 0)
test tests::gena_u12 ... bench:           4 ns/iter (+/- 0)
test tests::gena_u13 ... bench:           5 ns/iter (+/- 0)
test tests::gena_u14 ... bench:           7 ns/iter (+/- 0)
test tests::gena_u15 ... bench:           8 ns/iter (+/- 0)
test tests::gena_u16 ... bench:           7 ns/iter (+/- 0)
test tests::gena_u17 ... bench:           8 ns/iter (+/- 1)
test tests::gena_u18 ... bench:           7 ns/iter (+/- 0)
test tests::gena_u19 ... bench:           8 ns/iter (+/- 0)
test tests::gena_u20 ... bench:           6 ns/iter (+/- 0)
test tests::gena_u21 ... bench:           6 ns/iter (+/- 0)
test tests::gena_u22 ... bench:           6 ns/iter (+/- 0)
test tests::gena_u23 ... bench:           6 ns/iter (+/- 0)
test tests::gena_u24 ... bench:           8 ns/iter (+/- 0)
test tests::gena_u25 ... bench:           8 ns/iter (+/- 0)
test tests::gena_u26 ... bench:           7 ns/iter (+/- 0)
test tests::gena_u27 ... bench:           7 ns/iter (+/- 0)
test tests::gena_u28 ... bench:          11 ns/iter (+/- 0)
test tests::gena_u29 ... bench:          11 ns/iter (+/- 0)
test tests::gena_u30 ... bench:          13 ns/iter (+/- 0)
test tests::gena_u31 ... bench:          13 ns/iter (+/- 0)
test tests::gena_u32 ... bench:          13 ns/iter (+/- 0)

Now let’s call it a day.