Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ncclUnhandledCudaError: Call to CUDA function failed. #195

Open
bo-bobo opened this issue Jul 27, 2021 · 3 comments
Open

ncclUnhandledCudaError: Call to CUDA function failed. #195

bo-bobo opened this issue Jul 27, 2021 · 3 comments

Comments

@bo-bobo
Copy link

bo-bobo commented Jul 27, 2021

Traceback (most recent call last):
File "/home/psdz/anaconda3/envs/yolox/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/psdz/YOLOX/yolox/core/launch.py", line 91, in _distributed_worker
comm.synchronize()
File "/home/psdz/YOLOX/yolox/utils/dist.py", line 48, in synchronize
dist.barrier()
File "/home/psdz/anaconda3/envs/yolox/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 2524, in barrier
work = default_pg.barrier(opts=opts)
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:38, unhandled cuda error, NCCL version 2.7.8

@oliverwxg
Copy link

oliverwxg commented Jul 27, 2021

#147

issue as above:

i met the same bug。and i am working on it. can you help me out?

@ladyxuxu
Copy link

ladyxuxu commented Jul 5, 2022

i met the same issue during train the yolox_nano

1 similar comment
@ladyxuxu
Copy link

ladyxuxu commented Jul 5, 2022

i met the same issue during train the yolox_nano

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants