Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anyone can share a 44k pretrain or gives some guide for training 44k from scratch by tiny dataset? #704

Closed
4 tasks done
ILG2021 opened this issue Jan 9, 2025 · 13 comments
Labels
question Further information is requested

Comments

@ILG2021
Copy link

ILG2021 commented Jan 9, 2025

Checks

  • This template is only for question, not feature requests or bug reports.
  • I have thoroughly reviewed the project documentation and read the related paper(s).
  • I have searched for existing issues, including closed ones, no similar questions.
  • I confirm that I am using English to submit this report in order to facilitate communication.

Question details

I want to train a 44k model to get a better voice quality, but failed to train. My dataset is about 10 hours, after about 300k updates, the learning rate has deceased to 1e-13, seems not update, increase to 400k still not improved, the voice is clear, But the content is still a mess. I think the model can not learn alignment with tiny dataset. Anyone has a success example?

@ILG2021 ILG2021 added the question Further information is requested label Jan 9, 2025
@SWivid
Copy link
Owner

SWivid commented Jan 9, 2025

the learning rate has deceased to 1e-13

set a large epoch number and train longer

@ILG2021
Copy link
Author

ILG2021 commented Jan 9, 2025

the learning rate has deceased to 1e-13

set a large epoch number and train longer

Thank. Should I use small model not base model? @SWivid

@SWivid
Copy link
Owner

SWivid commented Jan 9, 2025

My dataset is about 10 hours,

smaller size is fine

@ILG2021 ILG2021 closed this as completed Jan 9, 2025
@ILG2021 ILG2021 reopened this Jan 10, 2025
@ILG2021
Copy link
Author

ILG2021 commented Jan 10, 2025

I have set epoch to 100000, and use F5 small model arch.
After 710k updates, the content is still a mess, and the sound become noisy, may be F5 is not suitable for tiny dataset from scratch.

@SWivid
Copy link
Owner

SWivid commented Jan 11, 2025

I have set epoch to 100000, and use F5 small model arch.
After 710k updates

have you reset the epoch and restart the training or continue from previous one?
how is the learning rate curve

@ILG2021
Copy link
Author

ILG2021 commented Jan 11, 2025

I create a new project, not resume to train. the learning rate curve is ok. because I set epoch to 100000

@SWivid
Copy link
Owner

SWivid commented Jan 11, 2025

how is the batchsize, e.g. batch_size_per_gpu and gpu numbers?

for reference: we use default setting as in yaml file for small model to train with 24 hours LJSpeech

@ILG2021
Copy link
Author

ILG2021 commented Jan 12, 2025

{
"exp_name": "F5TTS_Small",
"learning_rate": 7.5e-05,
"batch_size_per_gpu": 4800,
"batch_size_type": "frame",
"max_samples": 64,
"grad_accumulation_steps": 1,
"max_grad_norm": 1,
"epochs": 100000,
"num_warmup_updates": 300,
"save_per_updates": 10000,
"last_per_steps": 10000,
"finetune": false,
"file_checkpoint_train": "",
"tokenizer_type": "char",
"tokenizer_file": "",
"mixed_precision": "fp16",
"logger": "tensorboard",
"bnb_optimizer": false
}

This is the settings, I have only 1 4080 GPU

@SWivid
Copy link
Owner

SWivid commented Jan 12, 2025

{
"exp_name": "F5TTS_Small",
"batch_size_per_gpu": 4800,
"batch_size_type": "frame",
"grad_accumulation_steps": 1,
}
This is the settings, I have only 1 4080 GPU

equals to 10k updates with default setting as in yaml file for small model to train with 24 hours LJSpeech

for reference, under batch_size_per_gpu: 38400 # 8 GPUs, 8 * 38400 = 307200 setting, 100k updates to get ok results, 200k updates really good ones.

it surely takes some time to train from scratch

@ILG2021
Copy link
Author

ILG2021 commented Jan 12, 2025

{
"exp_name": "F5TTS_Small",
"batch_size_per_gpu": 4800,
"batch_size_type": "frame",
"grad_accumulation_steps": 1,
}
This is the settings, I have only 1 4080 GPU

equals to 10k updates with default setting as in yaml file for small model to train with 24 hours LJSpeech

for reference, under batch_size_per_gpu: 38400 # 8 GPUs, 8 * 38400 = 307200 setting, 100k updates to get ok results, 200k updates really good ones.

it surely takes some time to train from scratch

you mean I need 100k*307200/4800 = 6400k?

@ILG2021
Copy link
Author

ILG2021 commented Jan 14, 2025

@SWivid

@ILG2021
Copy link
Author

ILG2021 commented Jan 15, 2025

how is the batchsize, e.g. batch_size_per_gpu and gpu numbers?

for reference: we use default setting as in yaml file for small model to train with 24 hours LJSpeech

Hello, how many steps do you train and how many hours do you cost?

@SWivid
Copy link
Owner

SWivid commented Jan 15, 2025

check previous response
each 100k 8 hours on 8*h100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants