Error when training with 1 gpu: RuntimeError: a leaf Variable that requires grad is being used in an in-place operation. #29

jaketalyor32325 · 2022-04-16T06:47:13Z

Settings:
Win10 Pro
python 3.7.9
ptorch 1.8.1+cu111
1 GPU
GLOO backend
jupyter notebook

I can run the rest of the methods fine. I can run classifier_sample.py, super_res_sample.py but when I tried to run classifier_train.py I got a runtime error.

...\torch\distributed\distributed_c10d.py in broadcast(tensor, src, group, async_op)
1027 return work
1028 else:
-> 1029 work.wait()
1030
1031

RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.

This is the argument I used and the training commands:

TRAIN_FLAGS="--iterations 300000 --anneal_lr True --batch_size 256 --lr 3e-4 --save_interval 10000 --weight_decay 0.05"
CLASSIFIER_FLAGS="--image_size 128 --classifier_attention_resolutions 32,16,8 --classifier_depth 2 --classifier_width 128 --classifier_pool attention --classifier_resblock_updown True --classifier_use_scale_shift_norm True"

%run scripts/classifier_train.py --data_dir r"G:\data_set\imagenette2-160\train" $TRAIN_FLAGS $CLASSIFIER_FLAGS

Thanks for any comments and assistance in advance.

DCNemesis · 2022-07-22T18:34:53Z

@jaketalyor32325 I was able to comment out dist_util.sync_params(self.model.parameters()) from train_util.py and get the training to run. I assume the parameters really only need to be sync'd across multiple-gpus.

ONobody · 2023-03-07T02:45:12Z

@jaketalyor32325 Hello, do you need to make any adjustments when using your own data set code part when training guided_classifier?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when training with 1 gpu: RuntimeError: a leaf Variable that requires grad is being used in an in-place operation. #29

Error when training with 1 gpu: RuntimeError: a leaf Variable that requires grad is being used in an in-place operation. #29

jaketalyor32325 commented Apr 16, 2022

DCNemesis commented Jul 22, 2022

ONobody commented Mar 7, 2023

Error when training with 1 gpu: RuntimeError: a leaf Variable that requires grad is being used in an in-place operation. #29

Error when training with 1 gpu: RuntimeError: a leaf Variable that requires grad is being used in an in-place operation. #29

Comments

jaketalyor32325 commented Apr 16, 2022

DCNemesis commented Jul 22, 2022

ONobody commented Mar 7, 2023