-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hybrid setup remove adapt_returnn_config_for_recog #95
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Peter and I discussed this and both agree that this should not be something which happens automatically in the background. This should be controlled by the user on a config level. Also possible in a personal decoder class...
Agreed,
Then we would have two inputs
so like the base setup would get an abstract method that throws NotImplementedError and the user has to implement it? Anyway, I would make absolutely sure that the user has to do something and cannot assume this would somehow work automatically. Another idea would be to have a |
Yes. Exactly. The setup already supports this. Might still be a bit untested...
So the idea is to have different decoder classes, which can be used stand alone or within the system class. For example,
I like this. Did not think about that solution yet. @michelwi see my email |
This is basically rwth-i6/returnn_common#49. |
On a high-level, this is basically how most other frameworks or research projects have it, i.e. training and recog clearly separated. Of course, there is still some code sharing between both, but the entry points are separate, and what code is shared exactly is somewhat arbitrary, up to the user. (But the devil is in the details... How much code do you actually want to share? On what level? This is not obvious to answer.) This is very much how my current returnn-common-based setup looks as well. See here (work-in-progress): i6_experiments/users/zeyer/experiments/exp2022_07_21_transducer/pipeline_swb_2020.py Line 283 in 54b742d
This is the training function. The recognition function is below, but still WIP, but I will just wrap this function:
|
really?? At first glance I would disagree... The decoders I mention are all setups of sisyphus jobs which would in then call RASR or RETURNN for decoding?! |
Or do you mean the structure of training and search? |
Somewhere/somehow you need to define the interface for each case. This is what rwth-i6/returnn_common#49 is about. So that in your Sisyphus job, you would not just get a random raw config / net dict and make hard-coded assumptions what to expect in the net dict (layer names or so), but you have a well-defined interface. rwth-i6/returnn_common#49 goes a bit further. When you have such a well defined interface, I think this can also allow to implement ILM and related things in a nice way. |
I am still not sure on how to use your comments.. The decoder classes which this PR is referencing is all about RASR decoding.. so I am not exactly sure how this is influenced by Or am I understanding something wrong? are the decoder interfaces in |
Yes sure. returnn-common is for anything and everything you do with RETURNN, which includes RASR decoding using RETURNN models. And you need to have such interface, or not? I'm a bit confused that this is not clear? |
I disagree or you are not explaining it clearly enough. for me this is two things.. you use RETURNN to prepare everything so that RASR gets a
I am not arguing that you need some interface.. IMHO you still have not explained:
why? and still:
|
Just an arbitrary You need to have an interface on the level where you are building the net dict, which defines what outputs are expected, and then you build a model compatible to that interface. rwth-i6/returnn_common#49 is exactly about such kind of interface. |
ATM this relies on the user knowing how to setup the training and decoding in correspondence so that RASR decoding matches the training. I think we should keep supporting this in the future because AFAIK this is how everybody does RASR decoding. Of course you can always make this better and improve!
I think user experience and knowledge is what makes this non arbitrary.. but I agree you have design freedom to make errors where this might not be necessary
OK. But this should all happen outside of the Is your goal to change how RASR gets the TF model and graph? So move away from |
At the moment, the hybrid RASR decoding (as part of
Yes, the interface of defining the model. Each of the cases (hybrid, factored hybrid, transducer, whatever) would get a different interface. But then,
No, just extend it by such interface, which is needed here in |
so to verify, basically on a very simple level the To be honest, I am not sure how I should have understood this only from your comment:
I mean, just adding a few sentences for explanation would not have hurt... ^^ Now that I have an idea what you mean with the comment. Was this only for information purpose? Or do you plan to implement something? Because the
I think this depends on how extensiv the changes should be.. if something in RASR is already hard coded than changing that is a bit more effort.. if this is only hard coded in a sisyphus job this is easier to change.. and so on.. |
Not yet. The issue is about that. I have also often talked in the past about having that. But so far we don't have it. The issue just says that we need this. I actually already discussed with you and also with @mmz33, with @JackTemaki and others about this.
Well, this is all described in the issue, and its discussion, or not? Also, we already talked about it in the past.
Yes sure, I made this issue with the intention to also implement that, but I made the issue such that we would all discuss it together, and we agree on a good interface. I don't want to just do it alone without you being involved. That's why I also discussed it with you already some time before. That's also why I referenced it here again, because it seems to me that you forgot? And in any case, this is very relevant for exactly the PR here.
I'm not exactly sure what you mean. How is this relevant? In any case, we need such an interface. It's totally irrelevant in what state other parts of
Yea, this is all totally up for discussion. But in any case, we need rwth-i6/returnn_common#49 first, to define the interface. Or, we say we ignore that, and just stay with hardcoded assumptions on layer names and whatever in the net dict. But I thought that you want to get away from that? Once you want to get away from that, you need such an interface. Btw, when you look at rwth-i6/returnn_common#49: |
@albertz I think you still are confusing the handling of RETURNN and RASR. This PR was to delete code that influences the RETURNN side, which does not belong here as you correctly say. But again we are talking only about RASR helpers here (code under |
I'm not confusing it. I know what this PR is about. I'm speaking specifically about RASR helpers. Those RASR helpers do have assumptions on the net dict, like for hybrid models that there is an I think you are confusing it. If you say, you do not want to have such an interface, it should also work for any random graph file, then you say you want to keep sticking to the implicit hardcoded assumptions. Maybe that's also ok, if you define and document them well. I thought we already agreed that we do not want that. But if you now say, you want that, you don't want a well defined interface, then ok. Or maybe we can also have both, i.e. implicit hardcoded assumptions such that you can pass any random graph file, and also a well defined interface. In any case, this is sth we need to decide, and this is related to this PR here (this is why I mentioned it here), and related to how we implement the helpers in |
This.
Well, this is a strict requirement of the nn-precomputed-hybrid decoder in RASR, you can not avoid expecting exactly this. But the layer name and if it is log softmax or real softmax can be chosen freely. See rwth-i6/i6_core#307.
Yes sure, this is not the discussion point. There should be an interface that will define what network structure is expected.
No, accepting graph files has nothing to do with hardcoded assumptions. I don't see why you think there is any restriction to what we can do. We can have decoder classes that define the RASR parameters, and we can have returnn_common interfaces that say what a compatible network should look like. Those two things can and should be strictly separated. The decoder class does not need to know anything about the network, and the interface can just pass to the (suitable) decoder what it defines, which in case of hybrid decoding would be the layer name and the output type. In short:
|
FYI in PR #81 the decoder classes can be seen. Still WIP. But I think the general idea and pipeline is understandable. IMHO this should not depend on
|
Sure you can avoid this. You can make the interface part of RASR. E.g. like allowing to define the output tensor name, what kind it accepts, etc. But you can also hardcode it on that level but still have a well defined interface on Python side, which maps to the internal hardcoded assumptions of RASR.
Yes, so you proposed an interface for hybrid models in that PR. With rwth-i6/returnn_common#49, the PR in rwth-i6/i6_core#307 could be simplified. It would not need class HybridModel:
features: nn.Tensor
output_type: str
output: nn.Tensor ( The PR in rwth-i6/i6_core#307 is now only for hybrid models, where the interface is anyway somewhat simple (only You can also have multiple interfaces at multiple levels. E.g. rwth-i6/i6_core#307 and what is discussed in rwth-i6/returnn_common#49. The issue rwth-i6/returnn_common#49 was intended to start the discussion on having any kind of interface for all the relevant models because there was no such discussion before. So your PR in rwth-i6/i6_core#307 is very related to that. Because in your PR, you are now defining an interface for hybrid models.
Yes, and this is exactly what the issue rwth-i6/returnn_common#49 is about.
It does. It needs to know for example
You also need to have such an interface there. You have it also there: forward_output_layer: str = "log_output", And then you also have hardcoded assumptions in there, e.g.: tf_flow.config[tf_fwd].input_map.info_0.param_name = "input"
tf_flow.config[
tf_fwd
].input_map.info_0.tensor_name = "extern_data/placeholders/data/data"
tf_flow.config[
tf_fwd
].input_map.info_0.seq_length_tensor_name = (
"extern_data/placeholders/data/data_dim0_size"
) I'm not really sure what you are arguing here. All I was saying is, we probably want to have such an interface, and rwth-i6/returnn_common#49 is exactly about such interface, and so rwth-i6/returnn_common#49 is related here. So, you can have multiple interfaces at multiple levels. You can have hardcoded assumptions, again at multiple levels. Everything is possible. rwth-i6/returnn_common#49 is just an issue to discuss such interface. If you don't like the issue there, you can also have another discussion elsewhere. But when discussing any kind of interface, I think rwth-i6/returnn_common#49 should not be ignored. Also, remember, for hybrid models, this is all pretty trivial anyway. Maybe you don't think of having these 3 arguments ( |
The discussion we are now having is on how to design a RETURNN model interface. But this PR is only about removing code which we already decided do not want in these decoder classes. For a very specific reason: they alter the output of a model in a way the user did not explicitly specify, and also only if the model was defined in a very specific way. IMHO, this was a problem, which we addressed in this PR. I think nobody wants to go back to this kind of hard-coded assumptions. In contrast, this:
Sure. In theory, of course. But practically nobody has committed to help on RASR side. And at the moment Nick and I do not have the time to do this. So for us this is a strict requirement. For example, using RETURNN as a toolkit is also a strict requirement. Using An initial specification list for the
To ease this we have Initial specification for
Here, we can make certain assumptions, since this is a more specialized case and we can consider different trade-offs, which we can certainly discuss. IMHO, this should go here: #81 |
I don't exactly understand this argument. Nobody wants to work on RASR (just because?) so it means we cannot do the interface on that level? Also, this should be a quite trivial minor change to RASR, to introduce options for input/output tensor names. But anyway, you are introducing some interface here, e.g. #81, rwth-i6/i6_core#307. And rwth-i6/returnn_common#49 was intended as a starting point for a discussion on how exactly such interface should look like.
Not necessarily. You can still define an interface where inputs/outputs are clearly defined. This also works with the raw For the
This is contradictory, or not? Anyway, it sounds like you want to support both ways: option 1: relying on hard-coded assumptions (the user should define the model correctly), option 2: using an interface (defining output_type, feature_tensor_name, output_tensor_name in a flexible way). For option 1, you don't really need an interface on Python level. Once you need an interface, rwth-i6/returnn_common#49 becomes relevant. |
From Nick and my side time restraints and the rest just judging from the speed of other PRs...
We could but it involves more code bases, more people, more PRs -> ergo more time -> Nick and I lack this
are you voluntering?! xP
But then this would involve
I agree. There should already be no hard-coded assumptions from a RETURNN or
But there are two types of hard-coded assumptions:
I think 2. is worse and should be directly controlled by the user. This we dont want to allow.
I would integrate 1. and 2. depending on the part.. for example hard-code TDP settings. But give more flexibility via function args, for example output layer name. This interface could then be extended to accept an interface from |
Or |
I think we all want to keep this as simple as possible, i.e. reduce the complexity (and future effort to maintain it), such that we can get this to a workable state soon. I was just saying, we should not rule out changes on RASR. If we can do sth on RASR side, which would make everything much simpler and cleaner, we should do it. The specific RASR thing I mentioned is basically a one-line-change. But it doesn't mean we should do it. We should just keep that in mind when thinking about these things, and not rule out changes on some specific side. If you don't feel comfortable doing this change, I can do this, yes.
Depends where you put the interface. If the interface is in
I was only speaking really about point 1 here. But they are also almost the same situation. If you have 1, then doing 2 is not really much worse anymore. Actually, doing 2 can be safe when you have 1. E.g. when you say the model must have an output layer, that must be a softmax layer, no other layer must depend on it, then doing changes in some automatic way would be safe. The problem is already in the requirement that the model must be in defined in a specific way. Of course, there is not just black and white, but many variants in between. If you have point 1, i.e. specific requirements on the model definition, you can try to at least minimize the number of requirements. Also, in the end, having requirements on the model definition, that is just another way of defining an interface. The difference is basically just semantics and tool support. E.g. an interface on Python level will make development in IDEs easier, warn you much earlier about problems, makes debugging easier, is more flexible, etc.
|
adapt_returnn_config_for_recog()
is a leftover from on older setup. From my point of view, we should remove it completely. If a returnn config needs to be adapted for graph compilation, this should be done by the user before passing it toHybridArgs
viareturnn_recognition_configs
.