-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorized CPU implementation for various unary functions for f32 #2752
base: main
Are you sure you want to change the base?
Conversation
To help reproducibility of benchmarks, I used whisper example as a benchmark. cargo build --release --example "whisper" --features "symphonia"
# for powershell: Measure-Command { ./target/release/examples/whisper --input sample:gb1 | Out-Default }
# for single threaded linux: time taskset -c 0 ./target/release/examples/whisper --input sample:gb1
time ./target/release/examples/whisper --input sample:gb1 |
Machine:
Single threaded:
Multi threaded:
|
Some arm machine:
Single threaded:
Multi threaded:
|
I could add some vectorized implementation for binary function. But, these were not as significant as vectorization of more complex math functions (e.g. exp, tanh) since auto vectorization with sse2 features were good enough. |
@LaurentMazare you can let me know if there is any additional thing needed. |
Anybody out there for a feedback/review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mert-kurttutan! These changes seem interesting. I have yet to do some benchmarks, but I spotted something that seems incorrect.
Is there a official end-to-end benchmark? |
I implemented in another crate since the testing of precision of these for various backend was easier