You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to finetune the ColBERT v1.9 on my specific dataset for retrieval, but unable to do so. I encountered the below error:-
torch.distributed.DistBackendError: NCCL error in: /opt/conda/conda-bld/pytorch_1702400431970/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1333, invalid usage (run with NCCL_DEBUG=WARN for details), NCCL version 2.18.6
ncclInvalidUsage: This usually reflects invalid usage of NCCL library.
Last error:
Duplicate GPU detected : rank 0 and rank 1 both on CUDA device ca000
I guess it is some issues with the torch.distributed settings. Please help how can I resolve this ?
My specificateions are:
Single NVIDIA A40 GPU
Conda Package Manager
Python 3.8
The text was updated successfully, but these errors were encountered:
I don't have a solution for the problem you are experiencing. I wish you good luck and success. I would like to ask you to answer a question: Can you share the code(s) you used for the "ColBERT v1.9 on my specific dataset for retrieval" operation?
I am trying to finetune the ColBERT v1.9 on my specific dataset for retrieval, but unable to do so. I encountered the below error:-
torch.distributed.DistBackendError: NCCL error in: /opt/conda/conda-bld/pytorch_1702400431970/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1333, invalid usage (run with NCCL_DEBUG=WARN for details), NCCL version 2.18.6
ncclInvalidUsage: This usually reflects invalid usage of NCCL library.
Last error:
Duplicate GPU detected : rank 0 and rank 1 both on CUDA device ca000
I guess it is some issues with the torch.distributed settings. Please help how can I resolve this ?
My specificateions are:
Single NVIDIA A40 GPU
Conda Package Manager
Python 3.8
The text was updated successfully, but these errors were encountered: