-
Notifications
You must be signed in to change notification settings - Fork 271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gradients_memory require more memory than tf.Optimizer.minimize #11
Comments
RE: Why doesn't gradients_memory save any memory? Memory strategy heuristic works by selecting articulation points. This seems to be the wrong approach for U-net. The main part of the network doesn't have any articulation points. There are probably some articulation points on the edges of the network that network chooses. Bad choice of checkpoints can result in a strategy that uses more memory than original graph, so I wouldn't use As to why
|
I dug a bit into the I would be interested, whether a manual selection of checkpoints in the U-net architecture would allow to reduce the peak memory usage even further. How would you choose the checkpoints? |
Nope, automatic selection depends on layout of computation graph, and batch size doesn't change computation graph (it just changes size of individual nodes). |
So why don't OpenAI implement a similar strategy of |
@netheril96 swapping is slow, it's 7-10x faster to recompute on GPU for most ops |
@gchlebus I am working with a VAE which is roughly the same as the U-NET I was wondering where did you put the checkpoints? Thanks! |
@yaroslavvb May I ask a tangential question? What tool have you used to create that UNet graph? It looks awesome, so I want to learn to use that tool too. |
@netheril96 that one I just screenshotted from U-Net paper. Not sure what tool they used for it, but could be done easily in Omnigraffle which is what I used for diagrams in the blog post |
@yaroslavvb Oh. I was hoping for an automatic tool to generate beautiful graph from code. Tensorboard visualizations are too ugly. Thanks anyway. |
As far as I remember I put one checkpoint at the lowest u-net level. This had no difference in terms of speed or memory consumption when compared to the default checkpoint locations. |
@gchlebus how did you add the checkpoints? I am trying this:
But when I assign tf.dict["gradients"] = memory_gradients it does not find anything and raises an Exception. |
I would like to use the memory saving gradients to train a U-net model with bigger patches or/and increased batch size. I implemented a toy example to assess the memory usage when switching from tf.Optimizer.minimize to the memory saving gradients: https://github.com/gchlebus/gchlebus.github.io/blob/ca55f92d816ebe4659721b61e1a1f4f3b5c3e4f1/code/profiling-tf-models/u_net.py
What I surprisingly found out, is that the memory gradients require more memory than tf.Optimizer.minimize, but less memory than tf.gradients. I queried the peak memory usage using the
mem_util.py
.Memory usage:
tf.train.AdamOptimizer().minimize(loss)
: 75 MBtf.gradients(loss, tf.trainable_variables())
+optimizer.apply_gradients()
: 107 MBgradients_memory(loss, tf.trainable_variables())
+optimizer.apply_gradients()
: 96 MBI would have two questions:
tf.train.AdamOptimizer.minimize
? Am I using the memory saving gradients wrongly?minimize
function doestf.gradients
+optimizer.apply_gradients()
.I would greatly appreciate your feedback.
The text was updated successfully, but these errors were encountered: