[Feature] Pitch-retaining speed stretching with Phase Vocoder #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This Pull Request proposes a phase vocoder-based method to adjust audio speed without altering its original pitch with reasonable performance.
Resolves TeamFlos/phira#109.
Performance considerations
The proposed code adds a feature that activates the audio stretcher only on necessity. Thus there should be no performance regression in the normal game, except for Exercises Mode with a non-default speed setting. Meanwhile, the new stretcher is added as a preprocessor, only running at the audio loading step, thus no perceivable in-game latency regression should exist. This is the most critical factor for real-time rhythmic games like Phira.
By using a highly optimized
rustfft
library, the proposed code could be automatically accelerated on desktops (w/ Intel AVX and SSE) and mobile platforms (w/ Arm NEON). However, a small amount of delay may still be present on low-end devices between clicking the triangular "Play" button and the single starts playing, but the amount is acceptable in the author's view.Known limitations
The quality of stretched audio is limited by the algorithm being a lossy, predictive process.
Per-audio parameter tuning may be necessary to achieve optimal quality, including deciding on window functions and window sizes. However, do note that the current proposed settings in this PR already achieve an acceptable quality of generation. The most significant loss in quality happens to low-frequency parts of the audio, while more-perceivable mid to high-frequency components are largely unaffected.
Licensing
The proposed code is written by Rong "M." Bao, author of this PR. The implementation is adapted from a repository created by Andrew Yoon, licensed under a permissive CC0 1.0 Universal license. The algorithm described here is adapted from a repository written by Nasca O. Paul, which is placed in Public Domain.
The author of this PR and related documentation and code ("Code" hereafter) formally agrees that his Code could be licensed, used, and distributed under whatever license this main repository ( https://github.com/Mivik/sasa ) uses.