-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I'm curious about train,val ratio and epoch #10
Comments
Hello @jungtaekyung1, thanks for opening this issue.
It's been a while since I last ran these experiments, and I currently don't have access to the server on which I ran these models. However, from
As for the split ratio, I think I used 50 Korean songs from CSD, of which I used 40 songs for training and the rest for validation and evaluation. You can find the 8 songs we used for evaluation here. FYI I didn't use any of the English songs.
The provided vocoder, IIRC, is just the default HiFi-GAN pretrained on LJSpeech. So you will hear some artifacts. I'm not sure why inference would speed things up by 2x though.
Based on my hyperparameters, maybe the model was trained for too long. Do you know if it is overfit? Can you use intermediate checkpoints to run inference? |
The loss of the model trained by my own data is close to 1, and the trained song is well made, so I judge it to be overfitting. 1.I can run inference according to the checkpoint, but if I do the inferred wav file, I can only check the phoneme accuracy difference according to the number of epochs. I think that you trained with 40 songs is considered to mean that you trained with 80 data due to the nature of the csd data divided into a and b. The batchsize written in model.json is train: 384, val: 368. Shouldn't this be smaller than the number of songs? Or is the above batch-size a hyperparameter that is affected when wav, txt, and mid read for learning are read as an array?
I preprocessed the sampling rate of the trained wav files to 22050, and modified the sampling-rate of configs/preprocess.json and hifi-gan/config.json to 22050. However, songs that have not been learned, such as the issue mentioned above at the beginning, come out twice as fast in inference. Is there a hyperparameter of sampling-rate that I haven't considered? |
Each song is at least a few seconds long, and the model cannot be trained on the whole song sequence. Therefore, we sample a partial segment from the song to use it for training. So there are definitely more than 384 such segments in the entire training set.
The CSD originals might have been 44K, but I downsampled them to 22K for training. mlp-singer/configs/preprocess.json Line 6 in 7f4621c
Are the unseen songs you're running inference on 44K? How have you preprocessed the midi files to produce model inputs? I think you should look at the inputs you are feeding into the model at inference time and try doubling it. It's clear to me why this step would be necessary though. |
I know this noise is because I didn't train the vocoder and use the one provided.
Also, this song has a normal tempo, the same speed as the mid used for training.
However, inference using the lyrics and mid of a song not used for training speeds up by a factor of 2.
number of song : 30
epoch : 740
This model also produces songs that are trained well, but conversely, songs that are not trained, such as csd songs, produce songs at noise levels. It is judged as a different problem from the csd-based model created at twice the speed.
The problem I am thinking about is hyperparameters.
Can you provide information about train, val ratio and epoch? I know the number of songs I have is low, but I want to be sure compared to the epoch I used.
++
Also, I tried transfer learning to solve the lack of data in my data.
Learning the csd data 100 times and learning 30 of my songs 60 times on top of it gave a similar result to learning only my own songs.
This time, while waiting for your answer, I will try to mix only csd and my data, and use only csd as a checkpoint to mix and learn.
The text was updated successfully, but these errors were encountered: