-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long sequence pronunciation issue of pre-train model for custom language #675
Comments
the released base model is trained on up-to-30-sec audio dataset. so if you got a dataset with up to 20 sec samples and train from scratch, the model never see samples longer to 30 seconds try modify code in these places: F5-TTS/src/f5_tts/infer/utils_infer.py Lines 288 to 317 in 20aa6a1
F5-TTS/src/f5_tts/infer/utils_infer.py Line 61 in 20aa6a1
|
@SWivid Thank you for your response, but i want to know how many 30 second data need to contain into the custom dataset for long sequence handle. Basically the duration frequency like 30 second data how many time appear in your dataset and that's why long sequence generation quality is clear and natural. |
we use Emilia dataset which is an open-source one, you could just check out everything you want from https://huggingface.co/datasets/amphion/Emilia-Dataset |
@SWivid another queries regarding long sequence issue, you have already mention base model will solved the long sequence issue, according you observation i am working on it generate data for custom dataset. but if i trained small variant like f5-tts small model, it will solve the long sequence issue if long data appear into my dataset. |
the point is to include up-to-30-sec audio samples, the model size does not matter |
Checks
Environment Details
Steps to Reproduce
✔️ Expected Behavior
I have trained custom language around 500000 steps and vocab size 134 but the inference audio not clear on long sequence, not generated as text have. short sequence text is well generated but issue on long sequence around 100 character.
❌ Actual Behavior
What is the issue of long sequence round above 150 character in a sentence.
The text was updated successfully, but these errors were encountered: