-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing batch size and using multiple gpu makes Incompatible shapes issue. #9
Comments
Your error occurs because you feed What you edited must have been: xmcgan_image_generation/xmcgan/configs/coco_xmc.py Lines 49 to 50 in edbff38
|
Thank you for your reply.
|
Ok, so the error occurs at: xmcgan_image_generation/xmcgan/train_utils.py Line 421 in 22a7ef2
after a call of: xmcgan_image_generation/xmcgan/main.py Line 62 in 22a7ef2
|
Yes right. and when I print the 'next(train_iter)' before 421 line, it occurs error as below. |
Or is there more value I should change in the configuration file? |
If I were you, I would try to change fewer parameters actually. Try to get the code running without changing the eval batch size. For instance: config.batch_size = 28
config.eval_batch_size = 7 or config.batch_size = 14
config.eval_batch_size = 7 Indeed:
Moreover, I see in the README that:
I think you have 2 GPUs. Maybe try with a batch size of at least 16? Maybe 4 was too small. Finally, I see stuff like these which could be where the error arises, not sure about that. I would like to know where the value 32 comes from in xmcgan_image_generation/xmcgan/libml/input_pipeline.py Lines 43 to 47 in edbff38
where:
As a side-note, there is no enforcement that xmcgan_image_generation/xmcgan/libml/input_pipeline.py Lines 88 to 89 in edbff38
|
@woctezuma When I use one GPU and configuration is below makes it work:) Thank you for your kind reply. But when I use two GPUs and 16 batch sizes, it makes an error as below. It changed batch dimensions from 1 to 2.
|
It is hard to say, but I believe the error that you see comes from a line like this one:
where you have the first batch dim that is the number of GPUs. In your error message, it seems the code is expecting to see data chunked for 2 GPUs but receives data for 1 GPU. I wonder if there is an option to toggle ON the support for multiple GPUs. |
@Hyeonjin1989 I believe multiple GPUs should be supported natively by JAX? I did not have to do anything when running it on > 1 GPU. Can you print jax.local_device_count() to see what the value returned is? As for your second question: We did find that performance is quite sensitive to batch size. I've never run the model on 2 GPUs, so I suggest that you do some quick hyperparameter sweeps if possible to find the best performance. |
@woctezuma @kohjingyu Thank you for your kind help :) I printed
|
Can you paste your full error log? |
@kohjingyu This is my full error log. I also print |
Multiple GPU makes a issue. Error:
|
To be clear, the StackOverflow answer comes from kuza55/keras-extras#7 (comment). Are you forced to edit Also, it it normal that you have these lines in your log (reminiscent of #8)? |
|
I have a memory issue.
So would it be better to change the batch size?
I changed only two things in the configuration.
But it makes a dimension error as below.
My device is RTX 1080 TI (11GB) X 2
The text was updated successfully, but these errors were encountered: