RecursiveSynthVC

An expressive voice conversion model that is able to perform cross-speaker style transfer improved by self-generated synthetic expressive data.

TODO's

VITS2 (https://github.com/p0p4k/vits2_pytorch/)
NVIDIA BigVGAN (https://github.com/NVIDIA/BigVGAN)
Speaker Normalized Affine Coupling layer (SNAC) (https://github.com/hcy71o/SNAC)
Features preparation and Cosine Similarity based Speaker GRL (https://github.com/PlayVoice/whisper-vits-svc)
F0 estimation Torch CREPE (https://github.com/maxrmorrison/torchcrepe)