You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I realise that when I remove adversarial loss and feature match loss, it still works well and has no degeneration of performance. This makes me question the role of adversarial training in reduction of inference steps, or this this task is simple enough to learn directly with denoise model. Here are samples from two models https://drive.google.com/drive/folders/1uvURiQkOrP9n1jJsKyNe9NcSO4AfdFID?usp=sharing
The text was updated successfully, but these errors were encountered:
Hi @nguyenhungquang , thanks for sharing your insight. I also found the same result when I built this repo with the comparison of DiffSinger and DiffGAN-TTS. My conclusion was also that the task from LJSpeech is too easy. In my opinion, the GAN training will serve to be generalized with small steps when the dataset had more expressive and noisy speech.
@keonlee9420 Thank you. I've also trained with my dataset, which is a bit noisy, and it performs well. Though melspec is more clear when I visualise, it's unlikely to get noticed when listen. I think the difference might be more visible for multi-speaker dataset
I realise that when I remove adversarial loss and feature match loss, it still works well and has no degeneration of performance. This makes me question the role of adversarial training in reduction of inference steps, or this this task is simple enough to learn directly with denoise model. Here are samples from two models https://drive.google.com/drive/folders/1uvURiQkOrP9n1jJsKyNe9NcSO4AfdFID?usp=sharing
The text was updated successfully, but these errors were encountered: