-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation. #84
Comments
我也遇到了这个问题,因为我是windows,dist.init_process_group(backend="gloo", init_method="env://"),所以初始化中的backend首先需要修改,不然就会报ncll错误。然后就遇到了跟您一样的问题,但是这样修改后可以跑通,在dist.broadcast前加入 |
It has been solved according to your method. Thank you very much! @170744039 |
@170744039 Thank you for the solution! I have so limited knowledge of DDP that I have no clue how to fix it even though I know its probably a Windows/Linux problem. Could you share more how p = p +0 solves "a leaf Variable that requires grad is being used in an in-place operation"? |
@170744039 Hi! Thank you for your awesome solution for Windows. Could you help to pull a request for your modification? I do not have Windows-installed pc, so it would be great if someone could help its windows extensibility. |
1 similar comment
@170744039 Hi! Thank you for your awesome solution for Windows. Could you help to pull a request for your modification? I do not have Windows-installed pc, so it would be great if someone could help its windows extensibility. |
dist_util.pyolddef sync_params(params):
|
Traceback (most recent call last):
File "segmentation_train.py", line 117, in
main()
File "segmentation_train.py", line 69, in main
TrainLoop(
File "E:\MedSegDiff-master2.0\guided_diffusion\train_util.py", line 83, in init
self._load_and_sync_parameters()
File "E:\MedSegDiff-master2.0\guided_diffusion\train_util.py", line 139, in _load_and_sync_parameters
dist_util.sync_params(self.model.parameters())
File "E:\MedSegDiff-master2.0\guided_diffusion\dist_util.py", line 78, in sync_params
dist.broadcast(p, 0)
File "D:\anaconda3\envs\py38\lib\site-packages\torch\distributed\distributed_c10d.py", line 1438, in wrapper
return func(*args, **kwargs)
File "D:\anaconda3\envs\py38\lib\site-packages\torch\distributed\distributed_c10d.py", line 1561, in broadcast
work.wait()
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
您好,我在Linux系统中进行训练没有任何问题,但是我尝试在Windows系统下进行训练,所有的参数配置都是按照作者在readme中的参数设置,请问为什么会出现上述的错误,请大家帮助解决。
Hello, I have no problem with training in Linux system, but I try to train in Windows system, and all parameters are set according to the author's parameter Settings in readme. May I ask why the above errors occur, please help to solve.
if you want to chat with me: WeChat: DWBSIC
The text was updated successfully, but these errors were encountered: