-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include stresses preidction #13
Comments
Hi stasbel, the stresses are intentionally excluded as they are quite hard to predict and make the overall result worse (they are also commonly excluded from benchmarks in the literature). If you want to train a model with stresses you can simply add them to the symbols and proceed with preprocessing / training. If I have time I will try to train a model purely on stress prediction (phonemes in, phonemes + stress out) which I believe would make the overall performance quite good. |
this is very interesting, as stresses are very important for number of tasks |
Hi @cschaefer26, |
Hi, did you preprocess the data with the updated config and train a new model? You could check whether the processed data looks correct in datasets/combined_dataset.txt |
Hi, @cschaefer26
Cool lib!
I was just wondering: any particular reason you don't include stresses prediction into pipeline?
Both "cmudict-ipa" and "wikipron" has stresses labelling included.
Phoneme tokenizers from pretrained checkpoints lack
'
and,
symbols (this was probably done due to collision with puctuation, but it's pretty easy to avoid).The text was updated successfully, but these errors were encountered: