Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't reproduce the result by following your instructment #1

Open
mangostation opened this issue Dec 4, 2023 · 5 comments
Open

Can't reproduce the result by following your instructment #1

mangostation opened this issue Dec 4, 2023 · 5 comments

Comments

@mangostation
Copy link

Hi,
I tried to follow your instructment to reproduce the result of UMA but I got some problem. (except espnet version)
I only change the batch_bins and accum_grad to 6250000 and 4 in order to train it on my RTX 2080ti.
Here is my loss img and cer img. (I have trained twice)
If you have met this problem before I hope you can help me. Thanks a lot.

您好,
我想要復現您論文中所提出的模型,但在過程中遇到了一些困難,除了espnet版本不同外(還有其他實驗在進行的緣故),其餘流程接根據您寫的指示執行。
在設定上我將batch_bins和accum_grad改成6250000和4為了在RTX 2080ti上訓練。
但在訓練時loss和cer發生下方這種情況,我訓練了兩次都發生了一樣的問題。
如果您有遇過這個問題的話,希望您能分享您的經驗給我,非常感謝。

[中文/English] both ok
cer
loss

@FnoY0723 FnoY0723 closed this as not planned Won't fix, can't repro, duplicate, stale Dec 10, 2023
@FnoY0723
Copy link
Collaborator

I'm sorry for not getting back to you sooner. Which dataset are you referring to? Based on the training curve, the issue may be related to setting the learning rate too high.

I have added the training process for the models mentioned in the article to each dataset folder in egs2. I hope this will be helpful to you.

@FnoY0723 FnoY0723 reopened this Dec 10, 2023
@mangostation
Copy link
Author

Hi,
I find out that some training config in umaconf is not the same as the config.yaml in your aishell experiment with no condition.
The max_epoch, accum_grad, batch_size, batch_bins, lr are different.
I change the config to your experiment setting and still trying.
I will update the result when the training complete.
Thanks.

@FnoY0723
Copy link
Collaborator

The experimental result of the AISHELL-1 uma_conformer I uploaded is the earlier version. This experiment was conducted before we had standardized experimental settings (with different batch-size settings and others you mentioned).

Later on, we conducted experiments that conformed to "train_asr_uma_conformer.yaml" , and the final CER (Character Error Rate) for both experiments was consistent. Therefore, you can refer to both of these experimental settings.

@mangostation
Copy link
Author

Hi,
Sorry for replying so late.
I retry it on V100 and all settings follow to this github.
and than I got this.
skip
loss
cer

I suppose that maybe the problem is in CTC Loss.
If you have any idea please let me know.
Thanks.

@FnoY0723
Copy link
Collaborator

I did not come across this question, but I looked up the issues on ESPNet and this answer could be useful:
espnet/espnet#3170 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants