Description
Doing zip(chain(...))
with iterators seems to optimize very poorly. I tested this both by looking at the asm and micro-benches. This is annoying insofar that it seems like a very idiomatic way to write code that processes data in a way that can be vector from two buffers into a preallocated one.
Minimal reproducible example (this also occurs when using for loops instead of for_each
instead):
pub fn bad(slice: &mut [u8], front: &[u8], back: &[u8]) {
slice
.iter_mut()
.zip(front.iter().chain(back))
.for_each(|(a, b)| *a = *b);
}
pub fn good(slice: &mut [u8], front: &[u8], back: &[u8]) {
let mut it = slice.iter_mut();
it.by_ref().zip(front).for_each(|(a, b)| *a = *b);
// This also turns into a memcpy in my actual code
it.zip(back).for_each(|(a, b)| *a = *b);
}
For the asm, see https://godbolt.org/z/hbcGcTPa5.
I expected to see this happen:
Both versions turn into vectorized copy loops or two memcpy's each.
Instead, this happened:
The bad
one stays a loop that only copies one scalar at a time and the good
one turns into a memcpy and an vectorized copy loop (in my actual code this turns into two memcpys, but that isn't really an issue).
Meta
The bad
one doesn't optimize properly on every version going back at least 20 stables, so this is seemingly an optimization that was never established, most recently on the following version:
rustc --version --verbose
:
rustc 1.90.0-nightly (a00149764 2025-07-14)
binary: rustc
commit-hash: a001497644bc229f1abcc5b2528733386591647f
commit-date: 2025-07-14
host: x86_64-unknown-linux-gnu
release: 1.90.0-nightly
LLVM version: 20.1.8