Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

imlib/filter: Vectorize morph() kernel. #2415

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kwagyeman
Copy link
Member

@kwagyeman kwagyeman commented Sep 9, 2024

Depends on #2417.


Benchmark results here: https://docs.google.com/spreadsheets/d/1-FNVKCEr8-6UYs8MUm6wgsOt2c8ihJ2mg9QXKkG91os/edit?gid=452211341#gid=452211341

AE3 Performance with Helium is 4.2x faster than the RT1062.

Otherwise, note that this PR reduces the performance of the morph kernel by 50% for grayscale 3x3 kernels to be generic and vectorizable. The previous code provided the best possible speed for M4/M7 architectures but could not be vectorized and was only applicable for kernels of size 3x3. The new code offers vectorized processing for any kernel size.

Given the massive performance gain Helium has over the scalar code, this tradeoff makes sense.


Arguments mul/add were dropped as these are impossible to handle without complicating the default loop case. Additionally, they can easily overflow the 16-bit accumulators being used.

@kwagyeman kwagyeman force-pushed the kwabena/optimize_morph branch from 9d6c4ec to 3acaa57 Compare September 9, 2024 06:11
@kwagyeman kwagyeman changed the title Kwabena/optimize morph imlib/filter: Vectorize morph() kernel. Sep 9, 2024
Copy link

github-actions bot commented Sep 9, 2024

Code Size Report:

Firmware Text Diff Data Diff BSS Diff
ARDUINO_GIGA/firmware.elf 🔺0.02% (+280) ➖0.00% (+0) ➖0.00% (+0)
ARDUINO_NANO_33/firmware.elf 🔺0.00% (+8) ➖0.00% (+0) ➖0.00% (+0)
ARDUINO_NICLA_VISION/firmware.elf 🔺0.02% (+272) ➖0.00% (+0) ➖0.00% (+0)
ARDUINO_PORTENTA_H7/firmware.elf 🔺0.02% (+296) ➖0.00% (+0) ➖0.00% (+0)
OPENMV2/firmware.elf 🔺0.05% (+400) ➖0.00% (+0) ➖0.00% (+0)
OPENMV3/firmware.elf 🔺0.02% (+288) ➖0.00% (+0) ➖0.00% (+0)
OPENMV4/firmware.elf 🔺0.02% (+280) ➖0.00% (+0) ➖0.00% (+0)
OPENMV4P/firmware.elf 🔺0.02% (+280) ➖0.00% (+0) ➖0.00% (+0)
OPENMVPT/firmware.elf 🔺0.02% (+280) ➖0.00% (+0) ➖0.00% (+0)
OPENMV_RT1060/firmware.elf 🔺0.01% (+344) ➖0.00% (+0) ➖0.00% (+0)

@kwagyeman kwagyeman force-pushed the kwabena/optimize_morph branch 6 times, most recently from 35a520c to 3bc546d Compare September 14, 2024 00:03
@kwagyeman kwagyeman force-pushed the kwabena/optimize_morph branch 2 times, most recently from 44ccd95 to 56a348f Compare September 14, 2024 22:05
@kwagyeman kwagyeman marked this pull request as ready for review September 14, 2024 22:08
@kwagyeman kwagyeman force-pushed the kwabena/optimize_morph branch 2 times, most recently from ae11f46 to 71f7a91 Compare September 15, 2024 01:23
@kwagyeman kwagyeman force-pushed the kwabena/optimize_morph branch from 71f7a91 to 1f5aaad Compare September 17, 2024 05:29
@CLAassistant
Copy link

CLAassistant commented Dec 20, 2024

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants