You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use my own data for training. I've tested with the LJSpeech dataset, which even after a few thousand steps produces speech-like audio. Yet, training on my dataset (16000 Hz), it comes out as plain noise after even 40,000 steps. I'm assuming this is because of the audio hparams settings, where I changed the sample rate from 20000 to 16000, but I'm not sure what to change them to. For a 20000 hz audio, the length of frames are much shorter than the default setting, and I'm not sure what the frame shift is used for either. Is this something you tune by hand or is there a way to calculate these values? Thanks.
The text was updated successfully, but these errors were encountered:
I'm trying to use my own data for training. I've tested with the LJSpeech dataset, which even after a few thousand steps produces speech-like audio. Yet, training on my dataset (16000 Hz), it comes out as plain noise after even 40,000 steps. I'm assuming this is because of the audio hparams settings, where I changed the sample rate from 20000 to 16000, but I'm not sure what to change them to. For a 20000 hz audio, the length of frames are much shorter than the default setting, and I'm not sure what the frame shift is used for either. Is this something you tune by hand or is there a way to calculate these values? Thanks.
The text was updated successfully, but these errors were encountered: