Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pow matrix: remove unnecessary conditions in Matrix::compute_rank #517

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Slixe
Copy link

@Slixe Slixe commented Aug 14, 2024

In fn compute_rank(&self) remove the if condition previously used to remove the bounds check.
According to the issue linked (rust-lang/rust#90794) it has been fixed since 1.72.

Another condition below can be removed.

@Slixe Slixe changed the title consensus: slightly optimize Matrix::compute_rank consensus: remove unnecessary conditions in Matrix::compute_rank Aug 14, 2024
@Slixe
Copy link
Author

Slixe commented Aug 14, 2024

According to bench, I'm getting the following performance boost:

Compute Rank            time:   [46.144 ms 46.204 ms 46.271 ms]
                        change: [-27.123% -26.847% -26.590%] (p = 0.00 < 0.05)
                        Performance has improved

@Slixe Slixe changed the title consensus: remove unnecessary conditions in Matrix::compute_rank pow matrix: remove unnecessary conditions in Matrix::compute_rank Aug 14, 2024
@Slixe
Copy link
Author

Slixe commented Aug 19, 2024

Hey @michaelsutton sorry to ping you, would mind giving me a review on this?

@michaelsutton
Copy link
Contributor

michaelsutton commented Aug 25, 2024

Hey @michaelsutton sorry to ping you, would mind giving me a review on this?

I think the burden of proof is on you here. Why are the changes suggested in 2f2b8a6 equivalent to current code? you do realize this is a consensus-critical path

@Slixe
Copy link
Author

Slixe commented Aug 26, 2024

Of course, sorry for not providing more context I thought the code change was obvious enough.

Well, convert_to_float is much faster without unsafe & MaybeUninit because it is directly allocated on the stack afaik.

About the compute_rank function:

  • if i >= 64 { is removed because it is already handled in the loop range right above, and the linked issue is fixed with new rust versions
  • Because we can't be at j == 64 (range 0..64 go until 63, which prevent it) in the loop body, I removed the if condition and placed its code at the place of the break. This allows to eliminate a if condition being executed at each iteration. So, the only way to trigger the code is with if j != 64 { is to break in the inner while.

Comment on lines -54 to +62
// SAFETY: An uninitialized MaybeUninit is always safe.
let mut out: [[MaybeUninit<f64>; 64]; 64] = unsafe { MaybeUninit::uninit().assume_init() };
let mut out: [[f64; 64]; 64] = [[Default::default(); 64]; 64];

out.iter_mut().zip(self.0.iter()).for_each(|(out_row, mat_row)| {
out_row.iter_mut().zip(mat_row).for_each(|(out_element, &element)| {
out_element.write(f64::from(element));
*out_element = f64::from(element);
})
});
// SAFETY: The loop above wrote into all indexes.
unsafe { std::mem::transmute(out) }

out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be much slower, this one seems to optimize as well as the original:
std::array::from_fn(|i| std::array::from_fn(|j| f64::from(self.0[i][j])))

See profiling(llvm-mca) and assembly here: https://godbolt.org/z/q5aj7chn6

Comment on lines -72 to -75
if i >= 64 {
// Required for optimization, See https://github.com/rust-lang/rust/issues/90794
unreachable!()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this issue was resolved:
rust-lang/rust#90794, But I just checked in godbolt and it does reduces optimizations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants