Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use iterators in create_border_luma, add_residue, predict_dcpred #15

Merged
merged 1 commit into from
Dec 17, 2023

Conversation

okaneco
Copy link
Contributor

@okaneco okaneco commented Nov 1, 2023

Use iterators in places where indices are manually calculated because the compiler doesn't always optimize them. Iterators can remove extra bound checks or enable other optimzations like memset/memcpy or vectorized mov instructions.

Use branchless clamping in a loop to produce better vectorized code. This generates saturating truncation instructions instead of branches of greater than comparisons masking with and/andnot.


I was working on personal code and came across similar patterns where I found indexing manually to be a fair amount slower than using iterators.

I was reading the RFC to get a better understanding of the prediction functions and saw some places that could be improved.

The main heuristic I used for finding and replacing indexing was if it was complex enough (ie, more than for i in 0..arr.len() { arr[i] ... }), and the indexing was being done horizontally as opposed to vertical strides. Then I plugged the code in to Compiler Explorer to see if there were improvements, then testing it on the benchmark. I know that isn't the full picture because most of these functions have constants provided as arguments but it was still a helpful barometer.

There are 3 commits which get ~6% increase on the image benches. I've tried to break them up into small, easy to review commits.

Use iterators in places where indices are manually calculated
because the compiler doesn't always optimize them. Iterators
can remove extra bound checks or enable other optimzations like
memset/memcpy or vectorized mov instructions.

Use branchless clamping in a loop to produce better vectorized code
@fintelia fintelia merged commit b71c697 into image-rs:main Dec 17, 2023
9 checks passed
@okaneco okaneco deleted the use-iters0 branch December 17, 2023 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants