-
Notifications
You must be signed in to change notification settings - Fork 150
Strange bad performance with [u8; 16] and union
feature
#379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Huh, it seems like disabling
|
Actually, even that appears to be slower than
|
Because your fn vec_test() -> Vec<u8> {
let mut v = Vec::with_capacity(16);
for i in 0..12 {
v.push(i);
}
v
}
fn smallvec_test() -> SmallVec<[u8; 16]> {
let mut v: SmallVec<[u8; 16]> = SmallVec::new();
for i in 0..12 {
v.push(i);
}
v
} Using the above code, I get the following timings (Rust 1.87, aarch64-apple-darwin, M4):
|
I can reproduce a performance cliff when pushing more than 15 items into a For example, pushing 14 or 15 items into a I haven't looked at the codegen to figure out why this happens, but I would guess it is hitting a threshold that prevents some optimization like loop unrolling, and perhaps this has a cascading effect on other optimizations. |
union
feature
I have a case where it seems to be slower by a really consistent 12% or so. This benchmark calls As you can see above, hardly any difference in the smallvec case. The I started looking into this issue because of this. I just don't get why the heck this would be 12% slower in this real-world case when we are able to make the benchmarks look good. Smallvec can push a few things into its inline memory in a matter of picoseconds in the benchmarks, but in these real world tests, it takes longer than the normal I'm sorry to hit you with a "fix my crate" but what the heck it causing this? Is this some other compiler optimization? I have spent almost the entire day trying to get one profiling tool after another to work, and if they work at all, I can never "drill down" deep enough to see where exactly it is held up. |
On my machine (M4 Macbook Air), I get the following timings. It's not actually slower than
We should definitely look into the performance problems with As a general note, the most significant speed advantage of
|
Forgive me if I'm making some sort of amateur mistake here. I know when it comes to profiling, subtle things done wrong can completely bias a result, but I ran these two tests with
criterion
and it is saying that theSmallVec<[u8; 16]>
is about 100x slower than the normalVec
atpush()
.These are my tests:
However, when I change it to
SmallVec<[u8; 15]>
, the smallvec is 3x faster thanVec
(which is a result that I would expect).Why would adding that single additional byte make this so slow? I am using the
union
feature, so I would think that 16 bytes could fit inSmallVec
while keeping it the same size as aVec
. Is this a known problem? Am I making some obvious mistake?The text was updated successfully, but these errors were encountered: