Use iterators in create_border_luma
, add_residue
, predict_dcpred
#15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Use iterators in places where indices are manually calculated because the compiler doesn't always optimize them. Iterators can remove extra bound checks or enable other optimzations like memset/memcpy or vectorized mov instructions.
Use branchless clamping in a loop to produce better vectorized code. This generates saturating truncation instructions instead of branches of greater than comparisons masking with and/andnot.
I was working on personal code and came across similar patterns where I found indexing manually to be a fair amount slower than using iterators.
I was reading the RFC to get a better understanding of the prediction functions and saw some places that could be improved.
The main heuristic I used for finding and replacing indexing was if it was complex enough (ie, more than
for i in 0..arr.len() { arr[i] ... }
), and the indexing was being done horizontally as opposed to vertical strides. Then I plugged the code in to Compiler Explorer to see if there were improvements, then testing it on the benchmark. I know that isn't the full picture because most of these functions have constants provided as arguments but it was still a helpful barometer.There are 3 commits which get ~6% increase on the
image
benches. I've tried to break them up into small, easy to review commits.