Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory cost too much and do not start to train when using tensorflow1.4 #21

Open
sunume opened this issue Oct 31, 2017 · 4 comments
Open

Comments

@sunume
Copy link

sunume commented Oct 31, 2017

when training to cifar,it consist to increasing costed memory and do not begin to train.
I have a 2*16G RAM.It's seem to be enough.

1*GTX1080
cudnn6
tensorflow1.4
python3.5

@taufikxu
Copy link

taufikxu commented Dec 7, 2017

I met the same problem.

@pesser
Copy link

pesser commented Dec 10, 2017

See tensorflow/tensorflow#12598

@kolesman
Copy link

kolesman commented Dec 11, 2017

As @pesser pointed out the problem is caused by the broken data-dependent initialization mechanism.

I've implemented an alternative and more intuitive way of making data-dependent initialization a while ago. I've also just tried to merge my mechanism with the current pxpp++ code, please see https://github.com/kolesman/pixel-cnn.

Haven't checked the code extensively, but it seems to work. Let me know whether it also works for you,
then I will create a pull request.

@SammyGelman
Copy link

Just how memory intensive is pixelCNN++?

I've been fine training on smaller models but now that I've hit my wall I want to know exactly where and how the memory is being allocated.

I am currently training on images with size = 512x512, batch size = 5 and num_filters = 32.

I received a number of different errors:

OP_REQUIRES failed at cwise_ops_common.h:120 :Resource exhausted: OOM when allocating tensor with shape[5,64,256,256]

OP_REQUIRES failed at random_op.cc:77 : Resource exhausted: OOM when allocating tensor with shape[5,512,512,64]

etc...

I don't fully understand the shapes of these tensors. I see batch size there and when I play with the number of filters the 64 starts to change as well.

Any help would be much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants