Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Default process group has not been initialized, please make sure to call init_process_group. #1

Open
wys0907 opened this issue Dec 9, 2024 · 1 comment

Comments

@wys0907
Copy link

wys0907 commented Dec 9, 2024

我有一台服务器,服务器中有四张显卡,但是我只能使用一张,命令行是 CUDA_VISIBLE_DEVICE=0 python main.py configs/mask_rcnn_greedyvig_s_fpn_1x_coco.py --greedyvig_model greedyvig_s --work-dir output --auto-resume --gpu-id 0 --seed 42 --deterministic --cfg-options batch_size=8 --launcher none --auto-scale-lr
运行后的输出太长,在这截取一部分
loading annotations into memory...
Done (t=0.13s)
creating index...
index created!
2024-12-09 15:37:43,419 - mmdet - INFO - Start running, host: wangyusheng@mvr-System-Product-Name, work_dir: /home/wangyusheng/projects/pycharm/GreedyViG/detection/output
2024-12-09 15:37:43,419 - mmdet - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) CheckpointHook
(NORMAL ) EvalHook
(VERY_LOW ) TextLoggerHook

before_train_epoch:
(VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) EvalHook
(NORMAL ) NumClassCheckHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook

before_train_iter:
(VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) EvalHook
(LOW ) IterTimerHook

after_train_iter:
(ABOVE_NORMAL) OptimizerHook
(NORMAL ) CheckpointHook
(NORMAL ) EvalHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook

after_train_epoch:
(NORMAL ) CheckpointHook
(NORMAL ) EvalHook
(VERY_LOW ) TextLoggerHook

before_val_epoch:
(NORMAL ) NumClassCheckHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook

before_val_iter:
(LOW ) IterTimerHook

after_val_iter:
(LOW ) IterTimerHook

after_val_epoch:
(VERY_LOW ) TextLoggerHook

after_run:
(VERY_LOW ) TextLoggerHook

2024-12-09 15:37:43,419 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
2024-12-09 15:37:43,420 - mmdet - INFO - Checkpoints will be saved to /home/wangyusheng/projects/pycharm/GreedyViG/detection/output by HardDiskBackend.
Traceback (most recent call last):
File "main.py", line 311, in
main()
File "main.py", line 300, in main
train_detector(
File "/home/wangyusheng/projects/pycharm/GreedyViG/detection/mmdet_custom/apis/train.py", line 184, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 53, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 31, in run_iter
outputs = self.model.train_step(data_batch, self.optimizer,
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 77, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 248, in train_step
losses = self(**data)
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 119, in new_func
return old_func(*args, **kwargs)
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 172, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/mmdet/models/detectors/two_stage.py", line 127, in forward_train
x = self.extract_feat(img)
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/mmdet/models/detectors/two_stage.py", line 67, in extract_feat
x = self.backbone(img)
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wangyusheng/projects/pycharm/GreedyViG/detection/greedyvig_backbone.py", line 322, in forward
x = self.stem(inputs)
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wangyusheng/projects/pycharm/GreedyViG/detection/greedyvig_backbone.py", line 51, in forward
return self.stem(x)
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 731, in forward
world_size = torch.distributed.get_world_size(process_group)
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 867, in get_world_size
return _get_group_size(group)
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 325, in _get_group_size
default_pg = _get_default_group()
File "/home/wangyusheng/.conda/envs/pytorch112/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 429, in _get_default_group
raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

@wys0907
Copy link
Author

wys0907 commented Dec 9, 2024

sorry My hyperparameters were set incorrectly, but I have fixed it now. I added the --launcher pytorch at the end.I finish it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant