-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recompute_grad
Does Not Work
#5
Comments
Could you try defining the class outside the function and only use the instance in the function. I.e move Conv2D class instantiation outside of And if that’s doesn’t work set |
@joeyearsley Thanks for your help. So you mean doing this, yes:
|
The first option (instantiating
|
Keeping everything the same and adding the argument
|
To my knowledge, albeit limited, I once read that the |
Can you share your script? |
Yes, of course. Can I email it to you? It's rather large, and I'd prefer not to post it directly yet as it's for a class. |
@joeyearsley I haven't heard back from you, so I'll assume you want me to post things here. I think part of my error was in trying to wrap the calls in
I don't know what I'm doing wrong and how others are getting this to work. Can you help me? |
@joeyearsley Do you have a minimal working example I can try to run to get it working? Also, what versions of tensorflow and keras did you use when testing your code? TF 1.9? Keras 2.0? I know I need to use the tensorflow implementations of keras backend (i.e. |
@Sirius083 or @joeyearsley can you help me? |
@Sirius083 @joeyearsley I used
Is that correct? |
CAN SOMEONE PLEASE HELP?!!! |
I use another effcient densenet implementation at |
@Sirius083 That was the first one I tried, but it didn't work. What version of tensorflow and keras did you use? |
@Sirius083 Did you see that your gpu memory went down and training time per second went up when you used yaroslav's memory_saving_gradients? Also, are you using Windows or Linux? |
Note: Although tensorflow has an intrinsic implementation of the memory saving method I think, since it will give out CUDA MEMORY ALLODATED FAILED but can still training the model. However if the model is too big, like densenet-bc-190-40, the model cannot be trained under this method. |
@Sirius083 I've tried downgrading from tf-1.8 to 1.5 and still can't get it to work. I'm on Windows 10 and my task manager doesn't show any less memory being utilized when I use Right now, I am on tensorflow 1.5 with keras 2.1.6 using python 3.5 x64-bit. I make sure to use the tensorflow implementation of keras backend ( I define my model, add gradient checkpointing for several convolutional and fully-connected layers, then compile the model in a function called Here is all my code. I haven't put down a bunch of my
|
@gitrdonator sorry I did not use keras, I use tensorflow(1.9.0 on windows) and python 3.6 I think the problem may be you should first import tensorflow , than overwrite the gradient_memory function as below
|
@Sirius083 I tried that before as well, but that didn't work. However, I tried just now using your method with In imports, ensure
Then modify the end of my
but this still didn't work. Using Do you by chance have any other ideas? |
recompute_grad
Does Not Work
@joeyearsley Since this does not work, particularly with Keras as far as I can tell, can you please update your |
@Sirius083 Can you share your tensorflow code with me? I desparately need to get memory saving to work in Windows, and I can't get it to work using keras. |
@gitrdonator I just add the few lines before in this code in cifar10-densenet.py |
@Sirius083 I've opened many. You're more than welcome to look. |
@Sirius083 So, to be clear, what you're telling me is that you don't in fact have any working code? |
@joeyearsley @Sirius083 Can you please help me to get Would you mind please taking a look at Issue #42 I created on |
@gitrdonator I said it works on tensorflow 1.9 on windows , I never tried it on tensorflow 1.5. |
@Sirius083 Sorry, that was for @joeyearsley. He had said he had it "successfully working" in an issue post on |
@Sirius083 But while I have you here, would you mind taking a look at the issue and letting me know if you see anything that I could do? |
@Sirius083 You did not even import |
The method you propose for using
recompute_grad
is not working, except for the simplest case where all layers in the model are recomputed except the input and output layers. All other cases (e.g. when every-other layer is recomputed) cause the following error:Can you please advise how to fix this error?
My current method is to (1) create a memory efficient layer, e.g.:
then (2) use this within a
Lambda
layer when defining my model:I give unique names for each layer I use.
The text was updated successfully, but these errors were encountered: