Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning for multi GPU finetuning for ASR #182

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion asr-finetune-conformer-ctc-nemo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -336,7 +336,11 @@
"\n",
"NeMo uses `.yml` files to configure the training parameters. You may update them directly by editing the configuration file or from the command-line interface. For example, if the number of epochs needs to be modified, along with a change in the learning rate, you can add `trainer.max_epochs=100` and `optim.lr=0.02` and train the model. \n",
"\n",
"The following sample command uses the `speech_to_text_ctc_bpe.py` script in the `examples` folder to train/fine-tune a Conformer-CTC ASR model for 1 epoch. For other ASR models like Citrinet, you may find the appropriate config files in the NeMo GitHub repo under [examples/asr/conf/](https://github.com/NVIDIA/NeMo/tree/main/examples/asr/conf).\n"
"The following sample command uses the `speech_to_text_ctc_bpe.py` script in the `examples` folder to train/fine-tune a Conformer-CTC ASR model for 1 epoch. For other ASR models like Citrinet, you may find the appropriate config files in the NeMo GitHub repo under [examples/asr/conf/](https://github.com/NVIDIA/NeMo/tree/main/examples/asr/conf).\n",
"\n",
"<div class=\"alert alert-warning\">\n",
"If using multiple workers, the number of shards should be divisible by the world size to ensure an even split among workers. If it is not divisible, logging will give a warning but training will proceed, but likely hang at the last epoch. In addition, if using distributed processing, each shard must have the same number of entries after filtering is applied such that each worker ends up with the same number of files. We currently do not check for this in any dataloader, but the user’s program may hang if the shards are uneven.\n",
"</dev>"
]
},
{
Expand Down