Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve phase vocoder #23

Open
jurihock opened this issue Jun 6, 2022 · 4 comments
Open

Improve phase vocoder #23

jurihock opened this issue Jun 6, 2022 · 4 comments
Labels
enhancement New feature or request
Milestone

Comments

@jurihock
Copy link
Owner

jurihock commented Jun 6, 2022

@jurihock
Copy link
Owner Author

jurihock commented Jun 6, 2022

Just as an alternative idea to the phase vocoder...

DFT magnitude based phase estimation:

However:

  • SPSI doesn't work well with SDFT and STFT at smaller hops like 2048/32 (as tested in voyx).
  • PGHI appears to be 6-8 times slower than SPSI.

@jurihock jurihock added the enhancement New feature or request label Jun 6, 2022
@jurihock jurihock added this to the v1.5 milestone Jun 6, 2022
@jurihock
Copy link
Owner Author

jurihock commented Jun 26, 2022

The arctan approximation [3] is still faster than std::arg (about 50ms difference in case of the default voice sample), but of course less accurate e.g. compared to the python implementation. Since accuracy is more important to me at this point, the arctan approximation will be not yet implemented. Done in #40.

Regarding [1], my current observation is that the sliding vocoder generally produces less artifacts especially if pitching instrumental recordings. So it makes more sense to discover the sliding DFT first instead of obfuscating the vocoder...

@jurihock jurihock modified the milestones: v1.5, v2 Jun 26, 2022
jurihock added a commit that referenced this issue Jun 27, 2022
@jurihock
Copy link
Owner Author

Idea:

Use $log(a \cdot e^{j \phi}) = log(a) + j \phi$ instead of explicit std::abs and std::arg calls, since both log-amplitude and phase are needed anyway.

@jurihock
Copy link
Owner Author

jurihock commented Dec 18, 2023

The pitch shifting result is comparable to the signalsmith:

make
./out/main hybrid-phase --trim --freq=2 input.wav output.wav

where input.wav is the original dno-solo example converted to 16-bit mono wav.

Actually the default signalsmith configuration uses 6144 (3072 without zero padding) DFT bins, which is not power of two. Disabling multipleTimeObservations and zeroPadding makes no noticeable difference.

The comparable stftPitchShift configuration is -w 8k -v 4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant