How to convert the torch_dist ckpt to the nemo file? #11761
Unanswered
Kamizato-Ayaka
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Description:
I have trained a MegatronGPTSFT model using NeMo. After the training, I only obtained the torch_dist-organized checkpoint directory but did not get a .nemo file. I now need the .nemo file to convert the model to the Hugging Face format. However, I’m unable to figure out how to convert the torch_dist checkpoint into a .nemo file.
I have tried using the following scripts provided by NeMo:
NeMo/scripts/checkpoint_converters/convert_zarr_to_torch_dist.py
NeMo/examples/nlp/language_modeling/megatron_ckpt_to_nemo.py
Both methods fail at the load_checkpoints step. The error log is as follows:
Steps I Tried:
I verified the checkpoints in multiple Docker environments, but the issue persists.
I reviewed the NeMo documentation and examples but could not find a resolution.
Question:
Is there a better or more robust way to convert torch_dist checkpoints into a .nemo file? Any suggestions or best practices would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions