-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cell output is not clearly distinguishable from the state #2548
Comments
I'm in favor of proposal 2 (implementation in #2551): provide the interface that makes sense now, tag v0.16, and hopefully be done with breaking changes in the recurrent layers forever. Having a nice |
I do think proposal 2 would make things more straightforward for downstream implementations. The Pytorch approach makes sense for them since they provide high level Yeha I also realize that having the |
closed with #2551 implementing proposal 2 |
As of Flux v0.15, after the redesign in #2500, recurrent cells (RNNCell, GRUCell, LSTMCell) behave like this
and the output at each timestep is either
y_t = state_t = h_t
(RNNCell and GRUCell) ory_t = state_t[1] = h_t
(LSTMCell), that isIn this, we follow the pytorch style, but maybe we should have followed flax and lux instead and do
The Problem
The problem is that the current return from the cell's forward doesn't clearly distinguish the output and the state. So if we want to expose a wrapper layer around a cell, let's call it
Recurrent
, that processes an entire input sequence at once and return the stacked outputs (so whatRNN
,GRU
andLSTM
do, but now for arbitrary cells), it is not clear which interface should we ask the Cell types to comply to in order to beRecurrent
-compatible.What would be the rule for a Flux-compliant cell definition? Something like
This seems a little odd though.
Proposals
Proposal 1
We do nothing, since we have already churned the recurrent layers recently and we don't want to do it again. Pytorch survived fine with an interface similar to the one we currently have. It doesn't have a de facto interface that allow people to define a
cell
and immediately extend it with recursion, as flax does.We can have this interface but with the slightly odd rules described above.
Proposal 2
We do a Flux v0.16 as soon as possible changing the return of cells to
so that we have a clear cut interface that people can adopt to define custom cells.
Not happy of having another breaking release so soon, but since v0.15 has been around only a few days probably most people would skip it entirely and move directly to v0.16.
What Other Frameworks Do
As a reference, below I report the interface exposed by the different frameworks for cells and recurrent layers.
Flux v0.14
Flux v0.15
Lux
Flax
Pytorch
The text was updated successfully, but these errors were encountered: