-
Notifications
You must be signed in to change notification settings - Fork 130
Rec automatic optimization, special behavior of layers
A list of layers with special behavior when inside a recurrent loop vs outside recurrent loop (optimized out of the loop).
-
RecLayer
/RnnCellLayer
. Inside loop, they have hidden state, and do one step. Outside loop, they operate on the time sequence. Determined whether the input has a time dim. -
TwoDLSTMLayer
-
SelfAttentionLayer
(deprecated, in favor ofCumConcatLayer
, see #391) -
EditDistanceTableLayer
. Outside loop case is partly not implemented, although some efficient code already exists. -
MaskedComputationLayer
-
UnmaskLayer
-
WindowLayer
. Inside loop, keeps the previous N (window_size - 1
) frames as hidden state such that you have[B,window_size,...]
. Assumingwindow_right=0
andwindow_left=window_size - 1
(also see #570). Outside loop, just adds the window axis (with an efficient implementation). -
CumsumLayer
. For inputx
: inside loop, doesoutput = prev:output + x
. Outside loop, wrapstf.cumsum
. -
CumConcatLayer
. See #391 for a long discussion.
All other layers do not have special logic. So the implicit assumption is that the behavior is correct, i.e. when such a layer is optimized out of the loop, the behavior of the overall model/computation will not change.
This is obvious correct for layers such as LinearLayer
and most other layers where extra axes do not matter and the same operation would be calculated in every time frame. Basically all layers with recurrent=False
.
(Related are layers with special behavior on dynamic spatial axes but not quite.)
There are some layers which would/could get confused, for various reasons:
-
KenLmStateLayer
. Assumes to always be inside loop. -
DotLayer
. Thevar
option needsT?
. See #569. But otherwise operates fine, and it is important that this optimization is done correctly for efficiency. -
All layers operating on the time-dim axis (implicitly or explicitly via some
axis=T
), e.g.:-
ConvLayer
(implicitly). Number of dims determines whether 1D, 2D or 3D conv. -
ReduceLayer
withaxis=T
-
RecLayer
operating on another spatial dim (e.g. inside the loop we have [B,W,D])
-
-
RecStepInfoLayer
. Kind of special case. For:i
inside loop. Code could be added to performtf.range
outside loop, and thus is related toRangeInAxisLayer
. However, not sure if this special logic is worth it.
The corresponding issue about this is #573.
We could automatically add a tf.while_loop
(or anonymous RecLayer
) around layers which would behave incorrectly when optimized out of the loop. E.g. such as KenLmStateLayer
or RecStepInfoLayer
.