-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to define whether search (or train flag) is enabled? #18
Comments
If I understand this correctly, each module has an internal flag which defines wether this (and only this module) is in train or search mode. I like this more than a global flag. This makes checking within the module easy to apply different versions of the network. In my That being said maybe some questions that come to mind right now:
|
On PyTorch: there is a So, that's PyTorch. RETURNN has more flags, currently: The question is, what do we do here in returnn-common? Also, I think there is some discussion on PyTorch side whether the |
So you would actually argue, that it would be either really complicated in usage (so not straight forward as we want returnn_common to be) or to simple in logic (not catching all possibilities) to use such a flag, which in both cases will most likely make the users default to writing some simpler logic for their use case? I can actually see that. But maybe it makes sense to give a simple straight forward logic for the base cases, and then leave more complex cases to the user? |
I'm unsure. We could adopt the And then not introduce other flags in the base Remember, our goal is that it is straightforward for the user, with priority on understanding and reading existing code, but writing new code should be straightforward as well. It's hard to say which way would be the most straightforward. We maybe should do a bit of research and look for common usage patterns in other frameworks. How do they deal with the difference between training, search, forwarding, maybe other cases? E.g. look at Fairseq, ESPNet, others. |
Note that the RETURNN If we make everything completely independent from the RETURNN If we want to keep the RETURNN |
|
Note, we have now |
In case we need it at some later point, I guess we can add Functions like So, the current solution for the train flag is probably fine, and can be extended when needed. |
I tend to keep it as that, i.e. only having the train flag implicit, and all other aspects are explicit. There would not be any special handling for the eval flag. And the search flag would always be an explicit argument, e.g. to |
Use net search flag only as default fall back when search option is not explicitly provided for ChoiceLayer. DecideLayer and co will always apply. ChoiceLayer, new add_to_beam_scores option to control whether the score should be added to the beam when not doing search. Fix #946. Related: rwth-i6/returnn_common#18
Now with rwth-i6/returnn#947, I think it is easy to make the search flag always an explicit option for |
The @nn.scoped
def __call__(self, source: nn.Tensor, *,
source_spatial_axis: nn.Dim,
target: Optional[nn.Tensor] = None,
initial_state: Optional[nn.LayerState] = None,
search: bool,
beam_size: Optional[int] = None,
max_seq_len: Optional[Union[nn.Tensor, int]] = None,
) -> Tuple[nn.Tensor, nn.LayerState]:
"""
Forward step of Transformer
"""
memory = self.encoder(source, axis=source_spatial_axis)
loop = nn.Loop(max_seq_len=max_seq_len)
loop.state = initial_state if initial_state else self.default_initial_state()
with loop:
prev_target_embed = self.target_embedding(loop.state.target)
output, loop.state.decoder = self.decoder(
prev_target_embed, axis=nn.single_step_dim,
memory=memory, memory_spatial_axis=source_spatial_axis, state=loop.state.decoder)
logits = self.output_projection(output)
target = loop.unstack(target) if target is not None else None
if search:
loop.state.target = nn.choice(logits, input_type="logits", target=target, search=True, beam_size=beam_size)
loop.end(loop.state.target == self.target_eos_symbol, include_eos=False)
else:
assert target is not None
loop.state.target = target
outputs = loop.stack(loop.state.target)
return outputs, loop.state See specifically the logic regarding the I'm still not sure if this is the best way. Note that It would look bad though to add all the |
Maybe Then the user could write this to enable search:
Or instead of |
I implemented an abstract The @nn.scoped
def __call__(self, source: nn.Tensor, *,
source_spatial_axis: nn.Dim,
target: Optional[Union[nn.Tensor, nn.SearchFuncInterface]] = None,
initial_state: Optional[nn.LayerState] = None,
) -> Tuple[nn.Tensor, nn.LayerState]:
... And an example call looks like: transformer = nn.Transformer(...)
out, _ = transformer(
data, source_spatial_axis=time_dim,
target=nn.SearchFunc(
beam_size=3,
max_seq_len=nn.reduce(nn.length(data, axis=time_dim), mode="max", axis=nn.batch_dim))) |
I think then we have everything ready now. Whether the current solutions are good, we can only tell after a bit of usage. I think we can close this for now. |
While coming back to the search (currently only
|
I reopened this issue because I think we should improve and maybe redesign this. |
In the if search:
beam = search.get_beam()
beam.name = f"{nn.NameCtx.current_ctx().get_abs_name()}/target"
beam.dependency = beam.copy_as_prev_frame()
for x in loop.state.deep_tensors():
x.data.beam = beam.dependency There are multiple problems:
|
The current |
I now just removed the beam logic ( |
Note that this is not just about what is enough to be able to define all networks. Or not sure how you mean it.
It's about being straight forward and clear. I.e. at no time, it should be unclear to the user when search is used.
We do not have to follow exactly the behavior of RETURNN. There are also multiple ways in RETURNN. We can restrict it to one clean way. We can also change it. Or introduce a simpler variant here.
I'm tending to make it explicit. But not sure.
PyTorch also has a similar concept for the train flag (as we do have as well in RETURNN). I.e. some PyTorch modules behave differently depending if they are in train or eval mode (e.g.
Dropout
). We have exactly the same in RETURNN. And search is a flag like train.The difference is how these flags are set:
In RETURNN, this is all globally, and for search flag, there are some additional (maybe unintuitive) ways to overwrite it. And the flags are implied automatically in RETURNN, depending e.g. on the task, and the user has not much control over it. It is quite hidden.
In PyTorch, there are no implicit automatic implied global flags. Every module has its own flag, and it is set explicitly (and easily recursively for all sub modules). Every module has always the train flag set initially, and you can disable it explicitly. So to the user, it's always clear how the flags are set, because the user sets them, and no automatic behavior. The user explicitly writes
model.train()
ormodel.eval()
.Maybe again, here in returnn-common, we can follow the PyTorch style a bit for this, and also copy it for the search flag? Not sure...
Originally posted by @albertz in #16 (comment)
The text was updated successfully, but these errors were encountered: