Do I have false expectations or did I mess up? #645

KatieBelli · 2023-05-15T13:46:12Z

KatieBelli
May 15, 2023

I trained for about 4264 epochs now (Britney Spears model) and my settings are that it saves the model after 80-100 epochs, depending on my current settings/file amount.
I have audio files from other singers and use the latest saved model on them to compare the progress.

I noticed that the voice sounds similiar for many epochs now but the higher frequencies still sound a bit robotic and this has only improved a bit over the last 2000 epochs.

Is it normal that the voice model doesn't sound natural in general or am I just too impatient?

[I searched on Youtube for some AI covers by Britney Spears and most of them sound so good and clean, like Britney sung those songs (like Genie in a Bottle cover by AIVerse channel).]

My current config file (lines which I changed but could be that some values are not changed):

"log_interval": 100,
"eval_interval": 3000,
"seed": 10965,
"epochs": 7001,
"learning_rate": 0.00015,

"batch_size": 5,
"num_workers": 5,

My GPU:
NVIDIA GeForce GTX 1660

Current amount of files (<10 seconds each file):
172 singing files (24 minutes 34 seconds) + 20 speaking files (2minutes 10seconds) = 192 files (26 minutes 44 seconds)

Currently, all files don't have any reverb, delay, clicks or noise and their quality is lossless. Breathing is included in most of them.

I added some new audio files a bit before epoch 808, 883 and 3474.
I found on reddit how to do this:

-save them to your dataset_raw folder (optionally save a backup of your config file too)
-run following commands: svc pre-resample, svc pre-config svc pre-hubert
-edit new config file in configs folder to reflect previous settings/learning rate
-you should see python files in the dataset folders for your speakers for each .wav now
-you can now safely run svc train -t

Before epoch 1236 I cleaned some exisiting audio files of remaining hall that already existed before epoch 808 (that hall was not that noticeable but I want the cleanest dry vocals) and some noise. The quality of the vocals in each file remains very high quality without distortion after cleaning them.

I hope that this is enough information for you to tell me if I messed up or if my expectations are too high.

Any suggestions of changing values etc. are welcome.

KatieBelli · 2023-05-15T14:09:24Z

KatieBelli
May 15, 2023
Author

Here are some errors that appear when I run the train command, maybe they are the cause of this issue?

0 replies

hataori-p · 2023-05-15T15:21:47Z

hataori-p
May 15, 2023

I have a similar experience when training on data which comes from a stem separation SW. (I assume you don't have direct recordings of Britney). It seems to me that the model picks up some distortion from the separation SW what we don't hear and mess up the high frequency noises (in my case). Now I don't consider the singing data separated from songs HQ.

This is not an answer, only my experience from one model training.

3 replies

KatieBelli May 15, 2023
Author

I have a similar experience when training on data which comes from a stem separation SW. (I assume you don't have direct recordings of Britney). It seems to me that the model picks up some distortion from the separation SW what we don't hear and mess up the high frequency noises (in my case). Now I don't consider the singing data separated from songs HQ.

This is not an answer, only my experience from one model training.

I have recordings of her: official stems (dry vocals: only her solo voice, no backing vocals or layered vocals) and files of her recording songs. When I filtered out remaining hall I made sure that the filtered hall didn't influence the quality of the vocals.

hataori-p May 15, 2023

I'm sorry for assuming things. And how is it that there was a hall in it?

KatieBelli May 16, 2023
Author

I'm sorry for assuming things. And how is it that there was a hall in it?

It's alright, I don't think it's problematic to assume.

I also was surprised since they record in a studio in a room which is isolated so there can't be any hall or delay when they record their vocals.

The hall is very little, quieter and short, so it's not that noticeable if you don't pay close attention while listening. It's more obvious if the singer sings a bit louder and less noticeable if the singer sings quieter.

sbersier · 2023-05-16T09:58:05Z

sbersier
May 16, 2023

You can try to increase the "noise scale" in svcg (let's say --> 0.9). It might help tuning down (a bit) the high-frequencies "robotic" artefacts.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do I have false expectations or did I mess up? #645

{{title}}

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Do I have false expectations or did I mess up? #645

KatieBelli May 15, 2023

Replies: 3 comments · 3 replies

KatieBelli May 15, 2023 Author

hataori-p May 15, 2023

KatieBelli May 15, 2023 Author

hataori-p May 15, 2023

KatieBelli May 16, 2023 Author

sbersier May 16, 2023

KatieBelli
May 15, 2023

Replies: 3 comments 3 replies

KatieBelli
May 15, 2023
Author

hataori-p
May 15, 2023

KatieBelli May 15, 2023
Author

KatieBelli May 16, 2023
Author

sbersier
May 16, 2023