Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine pre-allocated storage #71

Open
helson73 opened this issue Nov 30, 2016 · 3 comments
Open

Determine pre-allocated storage #71

helson73 opened this issue Nov 30, 2016 · 3 comments

Comments

@helson73
Copy link

When pre-allocation is enabled, how could I determine which node is used to calculate gradients?
In LSTM implementation, for some nodes both inputs and outputs storage are shared between clones, but for other nodes only inputs are shared.
I want to add a peephole connection to current LSTM, and felt quite confused.
It seems if I want to decide which node share both and which nodes share only input when new model is deployed, I have to fully understand how nodes are handled in gModule ...
Any idea?
Thanks.

@helson73
Copy link
Author

I also have a question about "clones", since each clone's parameters are pointed to same storage, what's the difference with one don't use clones?
Parameters are shared anyway, whether to use clones or not, right?
But with clones pre-alloc enabled, some nodes are shared, so this is the main purpose?

@jsenellart
Copy link

prealloc is an ugly but effective tweak.

All clones indeed share the parameters, and with preallocation we also share the internal buffers used to store gradInput and some outputs.

But for both, we share only the intermediate buffers and we cannot share any buffer exposed outside of the nn graph - since as you say, the main goal of using clones is to keep full independent modules. So in other word each clone can be represented as:

gradInput <- (CLONE) -> output
                ||
          SHARED PARAMETERS

and outside gradInput, and output can not be shared at all (while all the parameters are shared) - but whatever is inside the clones can be shared as long as we don't messed up with calculation path.

For outputs, we do have an additional constraint which is that some modules do use outputs to calculate gradInput so we cannot share them at all

I hope this help. If you want to have a peephole connection to current LSTM - the safest is that you turn off preallocation.

@helson73
Copy link
Author

helson73 commented Dec 1, 2016

@jsenellart-systran
Thank you for your help.
I am afraid it's hard to drop pre-allocation in my case.
Actually I was working with theano-based NMT systems previously, but after suffering from inflexible memory management of theano, I decided to use torch instead last week. (recently we are working on much more complex and large scale NMT systems.)
Folks in harvard and systran and their awesome work presented here actually gave me a lot motivation.

About choosing which node to share outputs, it seems like every node in torch has their own overridden functions like "updateGradInput" and "accGradParameters". If I was right, if any node's these two functions use "self.output", then output should not be shared between clones. As you said, sigmoid node should not share outputs because it's "updateGradInput" function actually use "self.output". But linear node's both two functions don't use "self.output" at all, that's why linear node in lstm could share both input and outputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants