-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi_gpu error #6
Comments
+1 same problem here @gattia did you solve it in the end? |
@jakubLangr I actually didn't. My solution ended up being to use a p2.xlarge. They are a bit more expensive, but if you use spot instances you can get them for an average of about 0.14$/hr, or at least that's been my experience for the last month or so. |
Ah, thanks. If you take a look at this PR, it gives me lots of warnings, but
it feels like it is a step closer.
Fixing data shape problem by beeva-enriqueotero · Pull Request #10 ·
kuza55/keras-extras It fixes #8 and probably #2 Regards github.com
I then just struggle to get Keras to properly work. I think it allocates GPU
memory correctly (but it also could have been due to other things I was running
at the time), but it does not seem to use them and it gives me about ~100 of
these warnings:WARNING:tensorflow:Tried to colocate
gradients/tower_0_4/sequential_5/batch_normalization_15/moments/sufficient_statistics/count_grad/Rank
with an op
tower_0_4/sequential_5/batch_normalization_15/moments/sufficient_statistics/count
that had a different device: /device:CPU:0 vs /device:GPU:0. Ignoring colocation
property.
Can post the whole error if that helps.
Either way, it works afterwards but still only uses 1 GPU.
On Sun, Apr 2, 2017 11:32 PM, Anthony [email protected] wrote:
@jakubLangr I actually didn't. My solution ended up being to use a p2.xlarge.
They are a bit more expensive, but if you use spot instances you can get them
for an average of about 0.14$/hr, or at least that's been my experience for the
last month or so.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Oh and when I remove the batch normalization (which seemed to be the problem),
I still fail to use multiple GPUs as judging by nvidia-smi output, but it does
seem I actually allocate memory (at least for some time).
On Sun, Apr 2, 2017 11:47 PM, Jakub Langr [email protected] wrote:
Ah, thanks. If you take a look at this PR, it gives me lots of warnings, but it
feels like it is a step closer.
Fixing data shape problem by beeva-enriqueotero · Pull Request #10 ·
kuza55/keras-extras It fixes #8 and probably #2 Regards github.com
I then just struggle to get Keras to properly work. I think it allocates GPU
memory correctly (but it also could have been due to other things I was running
at the time), but it does not seem to use them and it gives me about ~100 of
these warnings:WARNING:tensorflow:Tried to colocate
gradients/tower_0_4/sequential_5/batch_normalization_15/moments/sufficient_statistics/count_grad/Rank
with an op
tower_0_4/sequential_5/batch_normalization_15/moments/sufficient_statistics/count
that had a different device: /device:CPU:0 vs /device:GPU:0. Ignoring colocation
property.
Can post the whole error if that helps.
Either way, it works afterwards but still only uses 1 GPU.
On Sun, Apr 2, 2017 11:32 PM, Anthony [email protected] wrote:
@jakubLangr I actually didn't. My solution ended up being to use a p2.xlarge.
They are a bit more expensive, but if you use spot instances you can get them
for an average of about 0.14$/hr, or at least that's been my experience for the
last month or so.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Sorry, I dont really have time to look into this and have not been using the multi-gpu at the moment. I think that eventually when Keras is integrated into tensorflow that distributed training will be relatively easy. I will report back if I do get around to looking into this. Anthony. |
Any update on this? I am seeing a similar behavior. |
Hi,
I've tried to run the multi_gpu program to parallelize a convolution neural network that is based on U-Net. I am trying to parallelize on a g2.8xlarge to take advantage of the 4 GPUs.
Anyways, when trying to run the code I got an error. Below are both the full error, as well as the function used to define the model/call the multi_gpu function (make_parallel). The part that calls make_parallel is pretty much at the very end of the script/this post.
This may be super simple but I have no experience with tf and am just starting with keras. Any suggestions would be greatly appreciated.
Thanks,
Anthony.
The text was updated successfully, but these errors were encountered: