Mapping over Arrays
Given I’m doing quite a lot of work with Rust, I could almost have forgotten that I’m still learning the language – and as sharing what I’ve learned is my prime reason for writing this blog, today I’m going to delve into two corners of Rust I had not touched so far.
So let’s set the stage: Just for fun, to see how it works and to maybe find
something useful, I tried to remove most allocations I could find from some
benchmarks. This was mostly
removing vec!
here and there, but in some places I could not get rid of them,
because, as you know if you’ve worked with arrays in Rust, those things need a
correct value at every index on construction.
Either using [Option<T>]
instead of Vec<T>
or writing some Default
just
to overwrite it later both look like suboptimal solutions. What I needed in
most cases was a way to map(_)
a function over an array, creating a new one
in the process. Something like:
// T is [Y; N] for some [X; N] we impl this trait for
pub trait ArrayMap<X, Y, T> {
fn map<F: Fn(&X) -> Y>(&self, f: F) -> T;
}
With the right implementations, this could be used like:
let squares = [1, 2, 3].map(|x| x * x);
This is easy to do for one array size, let’s take the fairly atypical zero sized array case first:
impl<U, V> ArrayMap<U, V, [V; 0]> for [U; 0] {
fn map<F: Fn(&U) -> V>(&self, _: F) -> [V; 0] { [] }
}
For the larger sizes (up to 32, as is custom in Rust), I could have pulled up my sleeves and write the impls by hand, but that would not have been very rustic. If one looks into the standard library, one finds a lot of macros to set up most trait implementations. As the saying goes, when in Rome, do like the Romans do, so let’s write some macros.
Note that this is my first foray into Rust macros, and the code may be ugly and needlessly complicated. Please bear with me.
First, we’ll need the impl with the fn, let’s keep the method unimplemented for now.
macro_rules! map_impl {
{$n:expr, $($ns:expr),*} => {
// this works like a template over $n
impl<U, V> ArrayMap<U, V, [V; $n]> for [U; $n] {
fn map<F: Fn(&U) _> V>(&self, _: F) -> [V; $n] {
unimplemented!() // for now
}
}
// take one down, pass it around, ...
map_impl!($($ns),*);
};
{, } => {};
}
// thirty-two impls of ArrayMap on the wall...
map_impl!(32, 31, 30, 29, .., 0);
What’s going on here? Within macro_rules!
, we get a number of matches with
corresponding expansions We call a match =>
expansion;
a clause.
The semicolon at the end of a clause is required. Within an expansion, we can
write valid Rust code, including calls to arbitrary macros – including the
macro we’re currently writing.
It took me a bit to learn the macro syntax, but the book is really helpful. I also perused a Stack Overflow answer from Daniel Keep which I’ll just link here for your edification. The point is that the match follows some simple rules, and we can embed stuff from the match in the expression.
We take a list of comma separated expressions (spaces don’t count), and take
the first and the rest separately; the $(..),*
in the match means zero or
more comma separated things. We can expand that with much the same syntax (just
omit the “type”). The macro calls itself until it has exhausted the list of
array sizes.
Next, we need to fill the unimplemented
method. In our case, we want to
insert [f(&self[$n-1]), .., f(&self[0])]
by using a macro invocation. But we
cannot do so directly, because we would break macro hygiene – so we need to
change our function a bit:
fn map<F: Fn(&U) _> V>(&self, f: F) -> [V; $n] {
map_inner!(f, self, $($ns),*)
}
This way, we give f
and self
to the macro keeping hygiene intact. Note
that we’ll have to match self
as expr
, not ident
, because self
is a
keyword.
With map_inner!
, we need to keep in mind that macros may not ever output
token trees that by themselves would not be valid Rust code. So we cannot just
macro_rules! map_inner {
{$f:ident, $s:ident, $n:expr, $($ns:expr),*} => {
map_inner!($f, $s, $($ns)*), $f($s[$n])
};
{$f:ident, $:ident, $n:expr} => { $f($s[$n]) }
}
and surround by [
brackets ]
, because .., f(..)
is not valid Rust code
for any ..
s. The solution is to build the complete token tree during
recursive calls and only output the full thing when finished:
macro_rules! map_inner {
{$f:ident, $s:expr, []; $n:expr, $($ns:expr),*,} => {
map_inner!($f, $s, [$f(&$s[$n])]; $($ns),*,)
};
{$f:ident, $s:expr, [$($t:tt)*]; $n:expr, $($ns:expr),*,} => {
map_inner!($f, $s, [$f(&$s[$n]), $($t)*]; $($ns),*,)
};
{$f:ident, $s:expr, [$($t:tt)*]; $n:expr, } => {
map_inner!($f, $s, [$f(&$s[$n]), $($t)*])
};
{$f:ident, $s:expr, $t:expr} => { $t };
}
Note: This may be a bit more complex than needed and was the result of some trial & error as well as reading the aforementioned sources. If you have a more elegant solution, you are hereby cordially invited to drop me a PR.
Now with all said & done, we can use this to map functions over (small) arrays. And learned something about macros. Done and done!
…
Am I? There’s a small voice in the back of my head whispering something to me. Something scary from the far away land of Typeshenaniganistan (I just made that one up): generic-array.
We should be able to map generic arrays, right? There’s a simple, if unsafe
way to do this: Set up the array with mem::uninitialized()
, then iterate
through the source and write the mapped value. Finally we have an array with
the desired values.
However, there’s a catch (and that means it’s not possible to do that in the
general case): Our function f(_)
could panic!(..)
! Don’t laugh – it’s
more common than you’d think. The problem is that by unwinding, we could end up
running our caller’s code (which could have set up our code in a thread) that
could then observe the uninitialized memory, which is undefined behavior. Nasal
demons galore!
So how can we become panic safe? With Rust as of 1.8.0 (the current stable), we have two options:
- Don’t use
mem::uninitialized()
, use some kind ofDefault::default()
, but that requires that our component type beDefault
andClone
, and writes every value twice. Also what happens if the.clone()
call panics (We can do thedefault()
call before setting up the array)? Well, it would be workable forCopy
types (becauseCopy::clone()
never fails), but otherwise this is replacing one evil with another. - Start another thread, set up the array there and catch the panic on join. This is a possibility, but very heavy-handed. Remember, we wanted to be resource-friendly. This breaks a fly on the wheel.
So if we ignore panic safety, we can absolutely do this. Otherwise we are out
of luck. On the other hand, with a current beta or nightly we are lucky,
because as of 1.9.0, catch_unwind(_)
is stabilized, which allows us to catch
the panic and clean up afterwards. This should have better performance because
the good, panic-free path is uncluttered, while the panicky path can be simply
mem::forget
ting the array (which is compiled away to nothing) and resuming
the unwinding.
This is called leak amplification in Rust lingo, and is the preferred method
of ensuring memory safety in the face of Cthulhu uncertainty regarding
termination of a function.
So as of May 26, 2016, we can have memory safe GenericArray<_, _>::map(..)
,
and it’s not even that complicated. The code is in
generic-array PR #12.
Now we’re done, right? Right?
…
Well, now we have two competing techniques – one looping and one straight-line
solution to the same problem. A prime target for a Benchmark, which should
be made for all array sizes from zero to 32 elements. We’ll also measure
collecting the results in a Vec
, as a baseline. However, since the benchmarks
should be made for all sizes between 0 and 32, there’s a lot of repetition. So
I’m going to use another macro (Warning: The following only works with a
nightly Rust):
#![feature(test)]
extern crate generic_array;
extern crate arraymap;
extern crate typenum;
extern crate test;
macro_rules! bench_map {
{$u:ident, $n:expr, $a:ident, $g:ident, $h:ident} => {
#[bench]
fn $a(b: &mut Bencher) {
let arr = [0u32; $n];
b.iter(|| {
black_box(arr).map(|x| x + 1)
});
}
#[bench]
fn $g(b: &mut Bencher) {
let arr : GenericArray<u32, $u> = GenericArray::new();
b.iter(|| {
black_box(arr).map(|x| x + 1)
});
}
#[bench]
fn $h(b: &mut Bencher) {
let arr = [0u32; $n];
b.iter(|| {
black_box(arr).iter().map(|x| x + 1).collect::<Vec<_>>()
});
}
};
}
#[cfg(test)]
mod tests {
use test::{black_box, Bencher};
use generic_array::*;
use arraymap::ArrayMap;
use typenum::consts::*;
bench_map!( U0, 0, array_00, gena_u00, vec_00);
..
bench_map!(U32, 32, array_32, gena_u32, vec_32);
}
The results: our loop is apparently slower; also we’re paying for the unwind capture with some nanoseconds.
running 99 tests
test tests::array_00 ... bench: 0 ns/iter (+/- 0)
test tests::array_01 ... bench: 0 ns/iter (+/- 0)
test tests::array_02 ... bench: 0 ns/iter (+/- 0)
test tests::array_03 ... bench: 1 ns/iter (+/- 0)
test tests::array_04 ... bench: 0 ns/iter (+/- 0)
test tests::array_05 ... bench: 1 ns/iter (+/- 0)
test tests::array_06 ... bench: 1 ns/iter (+/- 0)
test tests::array_07 ... bench: 2 ns/iter (+/- 0)
test tests::array_08 ... bench: 1 ns/iter (+/- 0)
test tests::array_09 ... bench: 1 ns/iter (+/- 0)
test tests::array_10 ... bench: 2 ns/iter (+/- 0)
test tests::array_11 ... bench: 3 ns/iter (+/- 0)
test tests::array_12 ... bench: 1 ns/iter (+/- 0)
test tests::array_13 ... bench: 2 ns/iter (+/- 0)
test tests::array_14 ... bench: 2 ns/iter (+/- 0)
test tests::array_15 ... bench: 3 ns/iter (+/- 1)
test tests::array_16 ... bench: 2 ns/iter (+/- 1)
test tests::array_17 ... bench: 3 ns/iter (+/- 0)
test tests::array_18 ... bench: 3 ns/iter (+/- 0)
test tests::array_19 ... bench: 4 ns/iter (+/- 1)
test tests::array_20 ... bench: 3 ns/iter (+/- 0)
test tests::array_21 ... bench: 3 ns/iter (+/- 1)
test tests::array_22 ... bench: 4 ns/iter (+/- 0)
test tests::array_23 ... bench: 5 ns/iter (+/- 1)
test tests::array_24 ... bench: 4 ns/iter (+/- 1)
test tests::array_25 ... bench: 4 ns/iter (+/- 0)
test tests::array_26 ... bench: 5 ns/iter (+/- 1)
test tests::array_27 ... bench: 5 ns/iter (+/- 1)
test tests::array_28 ... bench: 4 ns/iter (+/- 1)
test tests::array_29 ... bench: 5 ns/iter (+/- 0)
test tests::array_30 ... bench: 5 ns/iter (+/- 0)
test tests::array_31 ... bench: 6 ns/iter (+/- 0)
test tests::array_32 ... bench: 5 ns/iter (+/- 0)
test tests::gena_u00 ... bench: 8 ns/iter (+/- 0)
test tests::gena_u01 ... bench: 9 ns/iter (+/- 0)
test tests::gena_u02 ... bench: 13 ns/iter (+/- 1)
test tests::gena_u03 ... bench: 13 ns/iter (+/- 1)
test tests::gena_u04 ... bench: 14 ns/iter (+/- 10)
test tests::gena_u05 ... bench: 14 ns/iter (+/- 0)
test tests::gena_u06 ... bench: 15 ns/iter (+/- 1)
test tests::gena_u07 ... bench: 21 ns/iter (+/- 0)
test tests::gena_u08 ... bench: 16 ns/iter (+/- 0)
test tests::gena_u09 ... bench: 17 ns/iter (+/- 0)
test tests::gena_u10 ... bench: 17 ns/iter (+/- 0)
test tests::gena_u11 ... bench: 24 ns/iter (+/- 0)
test tests::gena_u12 ... bench: 18 ns/iter (+/- 0)
test tests::gena_u13 ... bench: 20 ns/iter (+/- 0)
test tests::gena_u14 ... bench: 21 ns/iter (+/- 0)
test tests::gena_u15 ... bench: 28 ns/iter (+/- 0)
test tests::gena_u16 ... bench: 22 ns/iter (+/- 0)
test tests::gena_u17 ... bench: 24 ns/iter (+/- 0)
test tests::gena_u18 ... bench: 25 ns/iter (+/- 0)
test tests::gena_u19 ... bench: 31 ns/iter (+/- 0)
test tests::gena_u20 ... bench: 26 ns/iter (+/- 0)
test tests::gena_u21 ... bench: 28 ns/iter (+/- 0)
test tests::gena_u22 ... bench: 29 ns/iter (+/- 0)
test tests::gena_u23 ... bench: 34 ns/iter (+/- 1)
test tests::gena_u24 ... bench: 30 ns/iter (+/- 0)
test tests::gena_u25 ... bench: 32 ns/iter (+/- 1)
test tests::gena_u26 ... bench: 33 ns/iter (+/- 0)
test tests::gena_u27 ... bench: 37 ns/iter (+/- 1)
test tests::gena_u28 ... bench: 34 ns/iter (+/- 0)
test tests::gena_u29 ... bench: 36 ns/iter (+/- 0)
test tests::gena_u30 ... bench: 37 ns/iter (+/- 0)
test tests::gena_u31 ... bench: 41 ns/iter (+/- 1)
test tests::gena_u32 ... bench: 44 ns/iter (+/- 1)
test tests::vec_00 ... bench: 0 ns/iter (+/- 0)
test tests::vec_01 ... bench: 18 ns/iter (+/- 0)
test tests::vec_02 ... bench: 31 ns/iter (+/- 0)
test tests::vec_03 ... bench: 30 ns/iter (+/- 1)
test tests::vec_04 ... bench: 30 ns/iter (+/- 0)
test tests::vec_05 ... bench: 30 ns/iter (+/- 0)
test tests::vec_06 ... bench: 31 ns/iter (+/- 1)
test tests::vec_07 ... bench: 35 ns/iter (+/- 0)
test tests::vec_08 ... bench: 32 ns/iter (+/- 0)
test tests::vec_09 ... bench: 33 ns/iter (+/- 0)
test tests::vec_10 ... bench: 31 ns/iter (+/- 1)
test tests::vec_11 ... bench: 36 ns/iter (+/- 1)
test tests::vec_12 ... bench: 32 ns/iter (+/- 2)
test tests::vec_13 ... bench: 33 ns/iter (+/- 1)
test tests::vec_14 ... bench: 33 ns/iter (+/- 4)
test tests::vec_15 ... bench: 37 ns/iter (+/- 0)
test tests::vec_16 ... bench: 34 ns/iter (+/- 0)
test tests::vec_17 ... bench: 36 ns/iter (+/- 1)
test tests::vec_18 ... bench: 37 ns/iter (+/- 1)
test tests::vec_19 ... bench: 38 ns/iter (+/- 1)
test tests::vec_20 ... bench: 39 ns/iter (+/- 1)
test tests::vec_21 ... bench: 41 ns/iter (+/- 0)
test tests::vec_22 ... bench: 46 ns/iter (+/- 1)
test tests::vec_23 ... bench: 47 ns/iter (+/- 0)
test tests::vec_24 ... bench: 41 ns/iter (+/- 1)
test tests::vec_25 ... bench: 50 ns/iter (+/- 0)
test tests::vec_26 ... bench: 49 ns/iter (+/- 1)
test tests::vec_27 ... bench: 50 ns/iter (+/- 2)
test tests::vec_28 ... bench: 53 ns/iter (+/- 1)
test tests::vec_29 ... bench: 52 ns/iter (+/- 1)
test tests::vec_30 ... bench: 54 ns/iter (+/- 1)
test tests::vec_31 ... bench: 56 ns/iter (+/- 9)
test tests::vec_32 ... bench: 56 ns/iter (+/- 1)
Ok now, let’s pack up and call it a day…
…
No wait! Actually, this is rather underwhelming. Only 14ns faster than a
collect::<Vec<_>>()
? And we cannot even do this on stable. Can’t we do
better? Since I wasn’t very knowledgeable around this part of the compiler, I
asked
on StackOverflow if we can somehow pre-leak the array and un-leak it after
filling it up. And within a few hours, I had an answer (the Rust community is
awesome as always)!
Using the nodrop crate by
Ulrik Sverdrup, it’s rather easy to leak a value
until it is ready. The way to leak it is NoDrop::new(_)
and .into_inner()
un-leaks it. The updated code is in the PR, and the benchmark (just for
genericarray) now gets the following results:
test tests::gena_u00 ... bench: 0 ns/iter (+/- 0)
test tests::gena_u01 ... bench: 0 ns/iter (+/- 0)
test tests::gena_u02 ... bench: 0 ns/iter (+/- 0)
test tests::gena_u03 ... bench: 1 ns/iter (+/- 0)
test tests::gena_u04 ... bench: 1 ns/iter (+/- 0)
test tests::gena_u05 ... bench: 1 ns/iter (+/- 1)
test tests::gena_u06 ... bench: 2 ns/iter (+/- 0)
test tests::gena_u07 ... bench: 2 ns/iter (+/- 0)
test tests::gena_u08 ... bench: 3 ns/iter (+/- 0)
test tests::gena_u09 ... bench: 3 ns/iter (+/- 0)
test tests::gena_u10 ... bench: 4 ns/iter (+/- 1)
test tests::gena_u11 ... bench: 7 ns/iter (+/- 0)
test tests::gena_u12 ... bench: 4 ns/iter (+/- 0)
test tests::gena_u13 ... bench: 5 ns/iter (+/- 0)
test tests::gena_u14 ... bench: 7 ns/iter (+/- 0)
test tests::gena_u15 ... bench: 8 ns/iter (+/- 0)
test tests::gena_u16 ... bench: 7 ns/iter (+/- 0)
test tests::gena_u17 ... bench: 8 ns/iter (+/- 1)
test tests::gena_u18 ... bench: 7 ns/iter (+/- 0)
test tests::gena_u19 ... bench: 8 ns/iter (+/- 0)
test tests::gena_u20 ... bench: 6 ns/iter (+/- 0)
test tests::gena_u21 ... bench: 6 ns/iter (+/- 0)
test tests::gena_u22 ... bench: 6 ns/iter (+/- 0)
test tests::gena_u23 ... bench: 6 ns/iter (+/- 0)
test tests::gena_u24 ... bench: 8 ns/iter (+/- 0)
test tests::gena_u25 ... bench: 8 ns/iter (+/- 0)
test tests::gena_u26 ... bench: 7 ns/iter (+/- 0)
test tests::gena_u27 ... bench: 7 ns/iter (+/- 0)
test tests::gena_u28 ... bench: 11 ns/iter (+/- 0)
test tests::gena_u29 ... bench: 11 ns/iter (+/- 0)
test tests::gena_u30 ... bench: 13 ns/iter (+/- 0)
test tests::gena_u31 ... bench: 13 ns/iter (+/- 0)
test tests::gena_u32 ... bench: 13 ns/iter (+/- 0)
Now let’s call it a day.