Footguns in SIMD land
If you use the newly stabilized std::arch module, you may find a few, well,
surprises:
If you fail to guard your SIMD code with #[cfg(and(target_arch = "..",
target_feature = ".."))], you’ll miss out on inlining. There is not even a
warning for this yet (clippy issue #????)
Both 128 and 256 bit types have three variations: One for {4, 8}×f32, one
for {2, 4}×f64 and one bag of bits that can be sliced to various integer
lengths and signedness. For historical reasons, the __m128/__m256 types
denote multiples of f32, whereas the f64-based types get a d suffix, and
the integer types get an i suffix. C/C++ programmers may feel at home here.
The type constructors aren’t const yet. This means that you cannot have static
__m128is, for example.
The _mm_set function and the __mm_loadu/__mm_storeu functions take their
arguments in inverse order. For example:
let mut y = [0xf64; 2];
let onetwo = _mm_set_pd(1.0, 2.0);
__mm_storeu_pd((&mut y).as_mut_ptr(), onetwo);
assert_eq!([2.0, 1.0], y); // who'd have thunk?
Speaking of which, the _mm_store_* functions don’t seem to be faster than the
_mm_storeu_* functions (at least on my skylake), so it’s unclear if defining
a 16-byte-aligned type for the former would be worth the hassle. The same goes
for _mm256_store_* and 32-byte-aligned types.
What problems have you found with Rust + stdsimd? Discuss on r/rust or rust-users!