We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, thanks for open sourcing the code. I have tried using multiple GPU training below
python train.py \ --batchSize 8 \ --nThreads 8 \ --name "$exp_name" \ --load_pretrained_g_ema "$pretrain_weight" \ --train_image_dir "$dataset_root"/"img_512" \ --train_image_list "$dataset_root"/"train_img_list.txt" \ --train_image_postfix ".png" \ --val_image_dir "$dataset_root""/img_512" \ --val_image_list "$dataset_root"/"val_mask_list.txt" \ --val_mask_dir "$dataset_root"/"mask_512" \ --val_image_postfix ".png" \ --load_size 512 \ --crop_size 512 \ --z_dim 512 \ --validation_freq 10000 \ --niter 50 \ --dataset_mode trainimage \ --trainer stylegan2 \ --dataset_mode_train trainimage \ --dataset_mode_val valimage \ --model comod \ --netG comodgan \ --netD comodgan \ --no_l1_loss \ --no_vgg_loss \ --preprocess_mode scale_shortside_and_crop \ --save_epoch_freq 10 \ --gpu_id 0,1,2,3 $EXTRA
and received the error: (This problem didn't have in the single gpu training)
(epoch: 1, iters: 9904, time: 0.171) GAN: 1.7399 path: 0.0003 D_real: 0.4633 D_Fake: 0.6500 r1: 0.2954 (epoch: 1, iters: 10000, time: 0.215) GAN: 1.4925 path: 0.0003 D_real: 0.3935 D_Fake: 0.9652 r1: 0.2954 saving the latest model (epoch 1, total_steps 10000) Saved current iteration count at ./checkpoints/comod-ffhq-512-4gpus/iter.txt. doing validation warnings.warn('Was asked to gather along dimension 0, but all ' Traceback (most recent call last): File "train.py", line 138, in generated,_ = model(data_ii, mode='inference') File "/binaries/anaconda3/envs/torch_py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/binaries/anaconda3/envs/torch_py36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 168, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/binaries/anaconda3/envs/torch_py36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 178, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/binaries/anaconda3/envs/torch_py36/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply output.reraise() File "/binaries/anaconda3/envs/torch_py36/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise raise exception TypeError: Caught TypeError in replica 3 on device 3. Original Traceback (most recent call last): File "/binaries/anaconda3/envs/torch_py36/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker output = module(*input, **kwargs) File "/binaries/anaconda3/envs/torch_py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) TypeError: forward() missing 1 required positional argument: 'data'
Do you know what could be the problem?
The text was updated successfully, but these errors were encountered:
I had no problem last time I tried training on multiple gpu. I have no access to multiple gpu currently. I'll look into this issue later
Sorry, something went wrong.
No branches or pull requests
Hi, thanks for open sourcing the code. I have tried using multiple GPU training below
and received the error: (This problem didn't have in the single gpu training)
Do you know what could be the problem?
The text was updated successfully, but these errors were encountered: