Do I have false expectations or did I mess up? #645
Replies: 3 comments 3 replies
-
Here are some errors that appear when I run the train command, maybe they are the cause of this issue? |
Beta Was this translation helpful? Give feedback.
-
I have a similar experience when training on data which comes from a stem separation SW. (I assume you don't have direct recordings of Britney). It seems to me that the model picks up some distortion from the separation SW what we don't hear and mess up the high frequency noises (in my case). Now I don't consider the singing data separated from songs HQ. This is not an answer, only my experience from one model training. |
Beta Was this translation helpful? Give feedback.
-
You can try to increase the "noise scale" in svcg (let's say --> 0.9). It might help tuning down (a bit) the high-frequencies "robotic" artefacts. |
Beta Was this translation helpful? Give feedback.
-
I trained for about 4264 epochs now (Britney Spears model) and my settings are that it saves the model after 80-100 epochs, depending on my current settings/file amount.
I have audio files from other singers and use the latest saved model on them to compare the progress.
I noticed that the voice sounds similiar for many epochs now but the higher frequencies still sound a bit robotic and this has only improved a bit over the last 2000 epochs.
Is it normal that the voice model doesn't sound natural in general or am I just too impatient?
[I searched on Youtube for some AI covers by Britney Spears and most of them sound so good and clean, like Britney sung those songs (like Genie in a Bottle cover by AIVerse channel).]
My current config file (lines which I changed but could be that some values are not changed):
My GPU:
NVIDIA GeForce GTX 1660
Current amount of files (<10 seconds each file):
172 singing files (24 minutes 34 seconds) + 20 speaking files (2minutes 10seconds) = 192 files (26 minutes 44 seconds)
Currently, all files don't have any reverb, delay, clicks or noise and their quality is lossless. Breathing is included in most of them.
I added some new audio files a bit before epoch 808, 883 and 3474.
I found on reddit how to do this:
Before epoch 1236 I cleaned some exisiting audio files of remaining hall that already existed before epoch 808 (that hall was not that noticeable but I want the cleanest dry vocals) and some noise. The quality of the vocals in each file remains very high quality without distortion after cleaning them.
I hope that this is enough information for you to tell me if I messed up or if my expectations are too high.
Any suggestions of changing values etc. are welcome.
Beta Was this translation helpful? Give feedback.
All reactions