You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Environment overview (please complete the following information)
Environment location: Docker
Method of NeMo install: Docker
If method of install is [Docker], srun bash -c "enroot import --output ${STAGE_PATH}/nvidia+nemo+24.09.sqsh docker://nvcr.io#nvidia/nemo:24.09" we also tried the latest image: nvcr.io/nvidia/nemo:25.02. job was executed also followed the instruction srun \ --container-image "$IMAGE" \ --container-mounts "$RESULT_DIR,$INDEX_MAPPING_DIR,$STAGE_PATH/cfg:/cfg,$STAGE_PATH/configure.sh:/gsw/configure.sh" \ --container-writable \ --no-container-mount-home bash -c "source /gsw/configure.sh && launch"
Environment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
OS version
PyTorch version
Python version
Additional context
Add any other context about the problem here.
GPU: 32 H100
The text was updated successfully, but these errors were encountered:
Describe the bug
When following this benchmark instruction, and using fp8 option, I get the following error:
Steps/Code to reproduce bug
Following this instruction https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dgxc-benchmarking/resources/nemotron15b-dgxc-benchmarking-b
you can reproduce the bug by running training with fp8
the code causing problem is :
470243c#diff-9d28dcb461bdb37dfaafbb51b827d6a9e51865afb7654d96417fe695fba22d0cR83
Expected behavior
NameError that has shown in the Traceback above.
Environment overview (please complete the following information)
srun bash -c "enroot import --output ${STAGE_PATH}/nvidia+nemo+24.09.sqsh docker://nvcr.io#nvidia/nemo:24.09"
we also tried the latest image:nvcr.io/nvidia/nemo:25.02
. job was executed also followed the instructionsrun \ --container-image "$IMAGE" \ --container-mounts "$RESULT_DIR,$INDEX_MAPPING_DIR,$STAGE_PATH/cfg:/cfg,$STAGE_PATH/configure.sh:/gsw/configure.sh" \ --container-writable \ --no-container-mount-home bash -c "source /gsw/configure.sh && launch"
Environment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
Additional context
Add any other context about the problem here.
GPU: 32 H100
The text was updated successfully, but these errors were encountered: