-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCore slower than NeMo native implementation #9524
Comments
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
Please take a look. |
Could you try using NeMo 2.0 + FSDP and comparing? We're not planning to support NeMo 1.0 + FSDP. |
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
This issue was closed because it has been inactive for 7 days since being marked as stale. |
Describe the bug
I've benchmarked both settings of
model.mcore_gpt
in an FSDP setting on the two most recent NVIDIA GPU architectures and foundmodel.mcore_gpt=False
to be consistently faster (although only slightly on the H100). Note that the A100 vs. H100 numbers are not meant to be comparable; they are run on different systems using a different number of CPU workers, but they do use the same software environment.Steps/Code to reproduce bug
Please list minimal steps or code snippet for us to be able to reproduce the bug.
Expected behavior
Since the
model.mcore_gpt=False
version has been deprecated, I would expect themodel.mcore_gpt=True
version to be at least on par with performance. The numbers on the A100 are substantially worse for themodel.mcore_gpt=True
version.Environment overview (please complete the following information)
venv
outside the container]git clone https://github.com/NVIDIA/NeMo.git && cd NeMo && git checkout dda92f00de2785de46983d7aa4ac77cbb1b353ec && python -m pip install .[all]
git clone https://github.com/NVIDIA/Megatron-LM.git && cd Megatron-LM && git checkout a645f89671be698612170539f2089dc15db66a80 && python -m pip install .
Additional context
The text was updated successfully, but these errors were encountered: