You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Same problem!
I got a common.pt file with 4K.
Docker image: nvcr.io/nvidia/nemo:25.02.rc5
zirui
changed the title
import_ckpt hangs when converting DeepSeek-v3 Hugging Face model to Nomo format
import_ckpt hangs when converting DeepSeek-v3 Hugging Face model to Nemo format
Mar 6, 2025
The conversion takes around 72 minutes in my environment. How long did you wait?
Did you create your BF16 checkpoint with the steps outlined here?
If you put the command in a python script directly, can you try wrapping it under a if __name__ == "__main__" block
If the above don't work, can you post the output of py-spy dump -p <pid> to see where the process is hanging
After a few hours, I could see that the weights directory contained 235GB of files. However, after more than ten hours, the process remained in this state, and it seemed to be stuck (hang).
Yes, I followed the instructions in this document to convert the checkpoint to BF16.
one issue I noticed is that the import_ckpt function in the documentation refers to llm.DeepSeekV3Model, but this is not implemented in the code. I used llm.DeepSeekModel as a replacement.
My code is already executed inside the if __name__ == "__main__" block.
I will try points 4 later and get back to you with an update.
Describe the bug
When using import_ckpt to convert the DeepSeek-v3 model (Hugging Face format) to Nemo format, the process hangs indefinitely without errors.
Steps/Code to reproduce bug
The output directory(/models/DeepSeek-V3-Base-bf16-nemo) looks incomplete, containing only 235GB of data, which seems smaller than expected
Expected behavior
The model should be successfully converted to Nomo format without hanging.
Environment details
nvcr.io/nvidia/nemo:25.02.rc4
The text was updated successfully, but these errors were encountered: