Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hybrid setup remove adapt_returnn_config_for_recog #95

Merged
merged 1 commit into from
Sep 15, 2022
Merged

Conversation

vieting
Copy link
Contributor

@vieting vieting commented Sep 8, 2022

adapt_returnn_config_for_recog() is a leftover from on older setup. From my point of view, we should remove it completely. If a returnn config needs to be adapted for graph compilation, this should be done by the user before passing it to HybridArgs via returnn_recognition_configs.

Copy link
Contributor

@christophmluscher christophmluscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Peter and I discussed this and both agree that this should not be something which happens automatically in the background. This should be controlled by the user on a config level. Also possible in a personal decoder class...

@michelwi
Copy link
Contributor

michelwi commented Sep 9, 2022

Peter and I discussed this and both agree that this should not be something which happens automatically in the background.

Agreed,

This should be controlled by the user on a config level.

Then we would have two inputs train_config and recog_config?

Also possible in a personal decoder class...

so like the base setup would get an abstract method that throws NotImplementedError and the user has to implement it?

Anyway, I would make absolutely sure that the user has to do something and cannot assume this would somehow work automatically.

Another idea would be to have a Network object that has a get_config_train() and get_config_recog() method and then internally could give different dicts or call different returnn_common code or do whatever..

@christophmluscher
Copy link
Contributor

Then we would have two inputs train_config and recog_config?

Yes. Exactly. The setup already supports this. Might still be a bit untested...

so like the base setup would get an abstract method that throws NotImplementedError and the user has to implement it?

Anyway, I would make absolutely sure that the user has to do something and cannot assume this would somehow work automatically.

So the idea is to have different decoder classes, which can be used stand alone or within the system class. For example, BaseDecoder, HybridDecoder, FactoredHybridDecoder, CtcDecoder, ... and users could also write their own decoder class where they can do whatever they want

Another idea would be to have a Network object that has a get_config_train() and get_config_recog() method and then internally could give different dicts or call different returnn_common code or do whatever..

I like this. Did not think about that solution yet. @michelwi see my email

@vieting vieting merged commit b398f92 into main Sep 15, 2022
@vieting vieting deleted the peter_hybrid branch September 15, 2022 14:45
@albertz
Copy link
Member

albertz commented Sep 16, 2022

So the idea is to have different decoder classes, which can be used stand alone or within the system class. For example, BaseDecoder, HybridDecoder, FactoredHybridDecoder, CtcDecoder, ... and users could also write their own decoder class where they can do whatever they want

This is basically rwth-i6/returnn_common#49.

@albertz
Copy link
Member

albertz commented Sep 16, 2022

Another idea would be to have a Network object that has a get_config_train() and get_config_recog() method and then internally could give different dicts or call different returnn_common code or do whatever..

On a high-level, this is basically how most other frameworks or research projects have it, i.e. training and recog clearly separated. Of course, there is still some code sharing between both, but the entry points are separate, and what code is shared exactly is somewhat arbitrary, up to the user. (But the devil is in the details... How much code do you actually want to share? On what level? This is not obvious to answer.)

This is very much how my current returnn-common-based setup looks as well. See here (work-in-progress):


This is the training function. The recognition function is below, but still WIP, but I will just wrap this function:
def model_search(decoder, *, beam_size: int = 12) -> nn.Tensor:

@christophmluscher
Copy link
Contributor

This is basically rwth-i6/returnn_common#49.

really?? At first glance I would disagree...

The decoders I mention are all setups of sisyphus jobs which would in then call RASR or RETURNN for decoding?!

@christophmluscher
Copy link
Contributor

Or do you mean the structure of training and search?

@albertz
Copy link
Member

albertz commented Sep 16, 2022

So the idea is to have different decoder classes, which can be used stand alone or within the system class. For example, BaseDecoder, HybridDecoder, FactoredHybridDecoder, CtcDecoder, ... and users could also write their own decoder class where they can do whatever they want

This is basically rwth-i6/returnn_common#49.

really?? At first glance I would disagree...

The decoders I mention are all setups of sisyphus jobs which would in then call RASR or RETURNN for decoding?!

Somewhere/somehow you need to define the interface for each case. This is what rwth-i6/returnn_common#49 is about. So that in your Sisyphus job, you would not just get a random raw config / net dict and make hard-coded assumptions what to expect in the net dict (layer names or so), but you have a well-defined interface.

rwth-i6/returnn_common#49 goes a bit further. When you have such a well defined interface, I think this can also allow to implement ILM and related things in a nice way.

@christophmluscher
Copy link
Contributor

I am still not sure on how to use your comments..

The decoder classes which this PR is referencing is all about RASR decoding.. so I am not exactly sure how this is influenced by returnn_common decoding..

Or am I understanding something wrong? are the decoder interfaces in returnn_common also for RASR decoding??

@albertz
Copy link
Member

albertz commented Sep 16, 2022

are the decoder interfaces in returnn_common also for RASR decoding??

Yes sure. returnn-common is for anything and everything you do with RETURNN, which includes RASR decoding using RETURNN models.

And you need to have such interface, or not? I'm a bit confused that this is not clear?

@christophmluscher
Copy link
Contributor

Yes sure. returnn-common is for anything and everything you do with RETURNN, which includes RASR decoding using RETURNN models.

I disagree or you are not explaining it clearly enough. for me this is two things.. you use RETURNN to prepare everything so that RASR gets a ReturnnModel aka TF model and a ReturnnConfig aka recognition graph.

  1. step one prepare everything with RETURNN -> returnn_common or something else
  2. do RASR decoding with the given inputs

I am not arguing that you need some interface..

IMHO you still have not explained:

This is basically rwth-i6/returnn_common#49.

why?
IMHO the decoder class here should be indenpendet of how you construct your ReturnnModel or ReturnnConfig. It can use the returnn_common interface, but it does not have to??

and still:

I am still not sure on how to use your comments..

@albertz
Copy link
Member

albertz commented Sep 16, 2022

Just an arbitrary ReturnnModel/ReturnnConfig is not good as an interface for the RASR decoding. How do you know that this is compatible to your factored hybrid, or whatever else you might expect?

You need to have an interface on the level where you are building the net dict, which defines what outputs are expected, and then you build a model compatible to that interface. rwth-i6/returnn_common#49 is exactly about such kind of interface.

@christophmluscher
Copy link
Contributor

Just an arbitrary ReturnnModel/ReturnnConfig is not good as an interface for the RASR decoding. How do you know that this is compatible to your factored hybrid, or whatever else you might expect?

ATM this relies on the user knowing how to setup the training and decoding in correspondence so that RASR decoding matches the training. I think we should keep supporting this in the future because AFAIK this is how everybody does RASR decoding. Of course you can always make this better and improve!

arbitrary ReturnnModel/ReturnnConfig

I think user experience and knowledge is what makes this non arbitrary.. but I agree you have design freedom to make errors where this might not be necessary

You need to have an interface on the level where you are building the net dict, which defines what outputs are expected, and then you build a model compatible to that interface. rwth-i6/returnn_common#49 is exactly about such kind of interface.

OK. But this should all happen outside of the System and the Decoder classes here.

Is your goal to change how RASR gets the TF model and graph? So move away from ReturnnModel/ReturrnConfig!?

@albertz
Copy link
Member

albertz commented Sep 16, 2022

At the moment, the hybrid RASR decoding (as part of System, Decoder etc here) has hardcoded assumptions on that it expects some output layer with a specific name, and that layer is supposed to return log probs or so. So when I say we need an interface, I mean we should go away from having such implicit hardcoded assumptions, to a well defined interface. An interface in the technical form of a base class, which has attributes or methods which would return whatever the interface needs to return. And this is exactly what rwth-i6/returnn_common#49 is about.

OK. But this should all happen outside of the System and the Decoder classes here.

Yes, the interface of defining the model. Each of the cases (hybrid, factored hybrid, transducer, whatever) would get a different interface. But then, System/Decoder/etc classes here would be tightly coupled to this interface.

Is your goal to change how RASR gets the TF model and graph? So move away from ReturnnModel/ReturrnConfig!?

No, just extend it by such interface, which is needed here in System/Decoder/etc, as I explained. Also, there are some open technical questions. E.g. where and how to apply the interface, how to use it exactly, or how to define the interface exactly. Would it return custom layer names? Would you pass on those custom layer names to RASR? Or map those custom layer names to a predefined (again hardcoded) layer name? Etc.

@christophmluscher
Copy link
Contributor

so to verify, basically on a very simple level the returnn_common interface offers a template for various models on how to structure the input and and output. and based on this interface you could then automatically get the correct output for decoding. yes?

To be honest, I am not sure how I should have understood this only from your comment:

This is basically rwth-i6/returnn_common#49.

I mean, just adding a few sentences for explanation would not have hurt... ^^

Now that I have an idea what you mean with the comment. Was this only for information purpose? Or do you plan to implement something? Because the returnn_common interface is still very much work-in-progress.. maybe it wold be better to have something half way stable until actually adapting pipelines..

where and how to apply the interface

I think this depends on how extensiv the changes should be.. if something in RASR is already hard coded than changing that is a bit more effort.. if this is only hard coded in a sisyphus job this is easier to change.. and so on..

@albertz
Copy link
Member

albertz commented Sep 18, 2022

so to verify, basically on a very simple level the returnn_common interface offers a template for various models on how to structure the input and and output. and based on this interface you could then automatically get the correct output for decoding. yes?

Not yet. The issue is about that. I have also often talked in the past about having that. But so far we don't have it. The issue just says that we need this. I actually already discussed with you and also with @mmz33, with @JackTemaki and others about this.

To be honest, I am not sure how I should have understood this only from your comment:

This is basically rwth-i6/returnn_common#49.

I mean, just adding a few sentences for explanation would not have hurt... ^^

Well, this is all described in the issue, and its discussion, or not? Also, we already talked about it in the past.

Now that I have an idea what you mean with the comment. Was this only for information purpose? Or do you plan to implement something?

Yes sure, I made this issue with the intention to also implement that, but I made the issue such that we would all discuss it together, and we agree on a good interface. I don't want to just do it alone without you being involved. That's why I also discussed it with you already some time before. That's also why I referenced it here again, because it seems to me that you forgot? And in any case, this is very relevant for exactly the PR here.

Because the returnn_common interface is still very much work-in-progress.. maybe it wold be better to have something half way stable until actually adapting pipelines..

I'm not exactly sure what you mean. How is this relevant? In any case, we need such an interface. It's totally irrelevant in what state other parts of returnn_common are. Maybe you refer to returnn_common.nn now?

returnn_common.nn is also certainly in a better state than this interface, as this interface so far is really non-existing (despite some proof-of-concepts, which are all somewhat incomplete, or already outdated, or turned out to not work well).

where and how to apply the interface

I think this depends on how extensiv the changes should be.. if something in RASR is already hard coded than changing that is a bit more effort.. if this is only hard coded in a sisyphus job this is easier to change.. and so on..

Yea, this is all totally up for discussion. But in any case, we need rwth-i6/returnn_common#49 first, to define the interface.

Or, we say we ignore that, and just stay with hardcoded assumptions on layer names and whatever in the net dict. But I thought that you want to get away from that? Once you want to get away from that, you need such an interface.

Btw, when you look at rwth-i6/returnn_common#49: LayerRef is a ref to a layer. Maybe that makes it more clear? (In more recent versions of returnn_common.nn, LayerRef was renamed to Tensor, but that's not really relevant for the discussion.)

@JackTemaki
Copy link
Contributor

But in any case, we need rwth-i6/returnn_common#49 first, to define the interface.

@albertz I think you still are confusing the handling of RETURNN and RASR. This PR was to delete code that influences the RETURNN side, which does not belong here as you correctly say. But again we are talking only about RASR helpers here (code under setups/rasr). This has nothing to do directly with any network assumptions, but is about, as Chris already said, managing RASR configs and the respective Sisyphus jobs. This code has to make some assumptions about the passed network, as RASR supports only specific network formats. But the naming can of course be freely chosen, if the name is passed manually hardcoded or given by an returnn_common interface is irrelevant here. The setups/rasr code should work no matter if you use a fixed Graph file, create it via dictionaries, via returnn_common interface or anything else.

@albertz
Copy link
Member

albertz commented Sep 19, 2022

I'm not confusing it. I know what this PR is about. I'm speaking specifically about RASR helpers. Those RASR helpers do have assumptions on the net dict, like for hybrid models that there is an output layer, that it contains log softmax outputs (or whatever), for factored hybrid models maybe sth else, for transducer models again sth else. I'm just saying that we probably want to have a well defined interface for this, instead of relying on some hardcoded implicit assumptions.

I think you are confusing it. returnn_common is not just returnn_common.nn. It is for anything you do with RETURNN. It is the place to define such an interface. And rwth-i6/returnn_common#49 is exactly about such an interface. rwth-i6/returnn_common#49 is actually nothing really too specific at this point, it is just the place where I opened the discussion for such an interface. No decisions on anything specific on the interface have been made yet.

If you say, you do not want to have such an interface, it should also work for any random graph file, then you say you want to keep sticking to the implicit hardcoded assumptions. Maybe that's also ok, if you define and document them well. I thought we already agreed that we do not want that. But if you now say, you want that, you don't want a well defined interface, then ok. Or maybe we can also have both, i.e. implicit hardcoded assumptions such that you can pass any random graph file, and also a well defined interface. In any case, this is sth we need to decide, and this is related to this PR here (this is why I mentioned it here), and related to how we implement the helpers in setups/rasr.

@JackTemaki
Copy link
Contributor

JackTemaki commented Sep 19, 2022

Or maybe we can also have both, i.e. implicit hardcoded assumptions such that you can pass any random graph file, and also a well defined interface

This.

like for hybrid models that there is an output layer, that it contains log softmax outputs (or whatever)

Well, this is a strict requirement of the nn-precomputed-hybrid decoder in RASR, you can not avoid expecting exactly this. But the layer name and if it is log softmax or real softmax can be chosen freely. See rwth-i6/i6_core#307.

It is for anything you do with RETURNN. It is the place to define such an interface.

Yes sure, this is not the discussion point. There should be an interface that will define what network structure is expected.

it should also work for any random graph file, then you say you want to keep sticking to the implicit hardcoded assumptions

No, accepting graph files has nothing to do with hardcoded assumptions. I don't see why you think there is any restriction to what we can do. We can have decoder classes that define the RASR parameters, and we can have returnn_common interfaces that say what a compatible network should look like. Those two things can and should be strictly separated. The decoder class does not need to know anything about the network, and the interface can just pass to the (suitable) decoder what it defines, which in case of hybrid decoding would be the layer name and the output type.

In short:

  • a HybridDecoder in i6_experimentes/setups/rasr should manage the Sisyphus jobs and RASR config to do Hybrid recognition, independent of the network.
  • a HybridInterface in returnn_common should define how a compatible network has to look like, and define the layer name as well as the output type

@christophmluscher
Copy link
Contributor

FYI in PR #81 the decoder classes can be seen. Still WIP. But I think the general idea and pipeline is understandable. IMHO this should not depend on returnn_common. I think this would otherwise go against how the setup should work:

  1. Setup everything correctly (with or without returnn_common)
  2. Pass everything to the decoder class
  3. run the pipeline

@albertz
Copy link
Member

albertz commented Sep 19, 2022

Well, this is a strict requirement of the nn-precomputed-hybrid decoder in RASR, you can not avoid expecting exactly this.

Sure you can avoid this. You can make the interface part of RASR. E.g. like allowing to define the output tensor name, what kind it accepts, etc.

But you can also hardcode it on that level but still have a well defined interface on Python side, which maps to the internal hardcoded assumptions of RASR.

But the layer name and if it is log softmax or real softmax can be chosen freely. See rwth-i6/i6_core#307.

Yes, so you proposed an interface for hybrid models in that PR.

With rwth-i6/returnn_common#49, the PR in rwth-i6/i6_core#307 could be simplified. It would not need output_type, feature_tensor_name, output_tensor_name. The interface (rwth-i6/returnn_common#49) would define exactly this. E.g. one example hybrid interface could look like:

class HybridModel:
  features: nn.Tensor
  output_type: str
  output: nn.Tensor

(nn.Tensor can just be a layer (tensor) name.)

The PR in rwth-i6/i6_core#307 is now only for hybrid models, where the interface is anyway somewhat simple (only output_type, feature_tensor_name, output_tensor_name). The discussion in rwth-i6/returnn_common#49 is also about more complex cases, like transducer models or attention models.

You can also have multiple interfaces at multiple levels. E.g. rwth-i6/i6_core#307 and what is discussed in rwth-i6/returnn_common#49. The issue rwth-i6/returnn_common#49 was intended to start the discussion on having any kind of interface for all the relevant models because there was no such discussion before. So your PR in rwth-i6/i6_core#307 is very related to that. Because in your PR, you are now defining an interface for hybrid models.

There should be an interface that will define what network structure is expected.

Yes, and this is exactly what the issue rwth-i6/returnn_common#49 is about.

The decoder class does not need to know anything about the network

It does. It needs to know for example output_type, feature_tensor_name, output_tensor_name in case of hybrid models. For other models, the interface becomes more complex. And this is exactly what rwth-i6/returnn_common#49 is about.

FYI in PR #81 the decoder classes can be seen.

You also need to have such an interface there. You have it also there:

        forward_output_layer: str = "log_output",

And then you also have hardcoded assumptions in there, e.g.:

    tf_flow.config[tf_fwd].input_map.info_0.param_name = "input"
    tf_flow.config[
        tf_fwd
    ].input_map.info_0.tensor_name = "extern_data/placeholders/data/data"
    tf_flow.config[
        tf_fwd
    ].input_map.info_0.seq_length_tensor_name = (
        "extern_data/placeholders/data/data_dim0_size"
    )

I'm not really sure what you are arguing here. All I was saying is, we probably want to have such an interface, and rwth-i6/returnn_common#49 is exactly about such interface, and so rwth-i6/returnn_common#49 is related here.

So, you can have multiple interfaces at multiple levels. You can have hardcoded assumptions, again at multiple levels. Everything is possible. rwth-i6/returnn_common#49 is just an issue to discuss such interface. If you don't like the issue there, you can also have another discussion elsewhere. But when discussing any kind of interface, I think rwth-i6/returnn_common#49 should not be ignored.

Also, remember, for hybrid models, this is all pretty trivial anyway. Maybe you don't think of having these 3 arguments (output_type, feature_tensor_name, output_tensor_name) as an interface. It becomes more complex for other model types.

@christophmluscher
Copy link
Contributor

The discussion we are now having is on how to design a RETURNN model interface. But this PR is only about removing code which we already decided do not want in these decoder classes. For a very specific reason: they alter the output of a model in a way the user did not explicitly specify, and also only if the model was defined in a very specific way. IMHO, this was a problem, which we addressed in this PR.

I think nobody wants to go back to this kind of hard-coded assumptions. In contrast, this:

Well, this is a strict requirement of the nn-precomputed-hybrid decoder in RASR, you can not avoid expecting exactly this.

Sure you can avoid this. You can make the interface part of RASR. E.g. like allowing to define the output tensor name, what kind it accepts, etc.

Sure. In theory, of course. But practically nobody has committed to help on RASR side. And at the moment Nick and I do not have the time to do this. So for us this is a strict requirement. For example, using RETURNN as a toolkit is also a strict requirement. Using returnn_common is not.

An initial specification list for the BaseDecoder would look something like this:

  1. sisyphus pipeline to support the RASR workflow: recognition, lattice creation, scoring.
  2. recognition and scoring should support different sisyphus jobs
  3. RASR at current implementation state (Hybrid, FactoredHybrid, Wei's Transducer)
  4. RASR interface provided via different feature scorers
  5. support of i6_core at current implementation state
  6. support of different LMs
  7. No call to RETURNN or returnn_common since the creation of the FeatureScorer and feature_flow objects needs to happen before.

To ease this we have HybridDecoder at the moment. FactoredHybrid and so on should follow at some point.

Initial specification for HybridDecoder:

  1. extend BaseDecoder
  2. prepare nn-precomputed-scorer
  3. prepare feature_flow, which means integrate tf flow node after the base feature extraction, here we could add a call to returnn_common

Here, we can make certain assumptions, since this is a more specialized case and we can consider different trade-offs, which we can certainly discuss. IMHO, this should go here: #81
IMHO, as a first step, HybridDecoder needs to support the current work flow which people are using; aka raw ReturnnConfig and ReturnnModel. Which means people have to create their models in regard to these hard-coded assumptions in the decoder class, or manipulate their model in some way before inputting the model into the decoder class. And not relying on some hidden function to manipulate their model for them. And this is what we mean when we say we want the decoder class here and RETURNN/returnn_common to be separate. The decoder class should not manipulate the model "to make it work". The user should define the model correctly.
I think, as a second step, an interface definition from returnn_common could of course be passed to a decoder class here, in order to make the decoder more flexible and accept a greater variety of models without the user manipulating the model by hand.
For backward compatibility, the hard-coded assumptions should be kept, IMHO.

@albertz
Copy link
Member

albertz commented Sep 19, 2022

But practically nobody has committed to help on RASR side.

I don't exactly understand this argument. Nobody wants to work on RASR (just because?) so it means we cannot do the interface on that level? Also, this should be a quite trivial minor change to RASR, to introduce options for input/output tensor names.

But anyway, you are introducing some interface here, e.g. #81, rwth-i6/i6_core#307. And rwth-i6/returnn_common#49 was intended as a starting point for a discussion on how exactly such interface should look like.

IMHO, as a first step, HybridDecoder needs to support the current work flow which people are using; aka raw ReturnnConfig and ReturnnModel. Which means people have to create their models in regard to these hard-coded assumptions in the decoder class, or manipulate their model in some way before inputting the model into the decoder class.

Not necessarily. You can still define an interface where inputs/outputs are clearly defined. This also works with the raw ReturnnConfig and ReturnnModel.

For the BaseDecoder, I think we should go away from having to rely on the hardcoded assumptions, towards having a well defined interface.

The user should define the model correctly.
I think nobody wants to go back to this kind of hard-coded assumptions.

This is contradictory, or not?
Defining the model correctly means to rely on hard-coded assumptions, or not?

Anyway, it sounds like you want to support both ways: option 1: relying on hard-coded assumptions (the user should define the model correctly), option 2: using an interface (defining output_type, feature_tensor_name, output_tensor_name in a flexible way).

For option 1, you don't really need an interface on Python level. Once you need an interface, rwth-i6/returnn_common#49 becomes relevant.

@christophmluscher
Copy link
Contributor

christophmluscher commented Sep 19, 2022

Nobody wants to work on RASR (just because?)

From Nick and my side time restraints and the rest just judging from the speed of other PRs...

so it means we cannot do the interface on that level?

We could but it involves more code bases, more people, more PRs -> ergo more time -> Nick and I lack this

Also, this should be a quite trivial minor change to RASR, to introduce options for input/output tensor names.

are you voluntering?! xP

You can still define an interface where inputs/outputs are clearly defined. This also works with the raw ReturnnConfig and ReturnnModel.

But then this would involve returnn_common interface or some function args, or not?

For the BaseDecoder, I think we should go away from having to rely on the hardcoded assumptions, towards having a well defined interface.

I agree. There should already be no hard-coded assumptions from a RETURNN or returnn_common point of view. This statement does not extend to the jobs used for decoding for example.

This is contradictory, or not?

But there are two types of hard-coded assumptions:

  1. you have to define your model in a specific way
  2. the class changes something about the model in the background

I think 2. is worse and should be directly controlled by the user. This we dont want to allow.

support both ways

I would integrate 1. and 2. depending on the part.. for example hard-code TDP settings. But give more flexibility via function args, for example output layer name. This interface could then be extended to accept an interface from returnn_common. But maybe it would be cleaner, just to add a ReturnnCommonDecoder. So that it is clear that you need the returnn_common interface and that you are defining your model in this way...

@christophmluscher
Copy link
Contributor

For the BaseDecoder, I think we should go away from having to rely on the hardcoded assumptions, towards having a well defined interface.

Or BaseDecoder could use the returnn_common interface and for HybridDecoder you add an extra layer so that the returnn_common interface does not get exposed to the user?!

@albertz
Copy link
Member

albertz commented Sep 19, 2022

I think we all want to keep this as simple as possible, i.e. reduce the complexity (and future effort to maintain it), such that we can get this to a workable state soon. I was just saying, we should not rule out changes on RASR. If we can do sth on RASR side, which would make everything much simpler and cleaner, we should do it. The specific RASR thing I mentioned is basically a one-line-change. But it doesn't mean we should do it. We should just keep that in mind when thinking about these things, and not rule out changes on some specific side. If you don't feel comfortable doing this change, I can do this, yes.

You can still define an interface where inputs/outputs are clearly defined. This also works with the raw ReturnnConfig and ReturnnModel.

But then this would involve returnn_common interface or some function args, or not?

Depends where you put the interface. If the interface is in returnn_common, then yes. If the interface is somewhere else, than you need that other thing. I think returnn_common makes most sense for such an interface, as that is exactly the main purpose of returnn_common.

But there are two types of hard-coded assumptions:

  1. you have to define your model in a specific way
  2. the class changes something about the model in the background

I think 2. is worse and should be directly controlled by the user. This we dont want to allow.

I was only speaking really about point 1 here. But they are also almost the same situation. If you have 1, then doing 2 is not really much worse anymore. Actually, doing 2 can be safe when you have 1. E.g. when you say the model must have an output layer, that must be a softmax layer, no other layer must depend on it, then doing changes in some automatic way would be safe. The problem is already in the requirement that the model must be in defined in a specific way.

Of course, there is not just black and white, but many variants in between. If you have point 1, i.e. specific requirements on the model definition, you can try to at least minimize the number of requirements.

Also, in the end, having requirements on the model definition, that is just another way of defining an interface. The difference is basically just semantics and tool support. E.g. an interface on Python level will make development in IDEs easier, warn you much earlier about problems, makes debugging easier, is more flexible, etc.

But maybe it would be cleaner, just to add a ReturnnCommonDecoder. So that it is clear that you need the returnn_common interface and that you are defining your model in this way...

returnn_common is intended to provide helpers, interfaces, building blocks, etc for any kind of model, like hybrid, factored hybrid, CTC, transducer, attention, etc. I don't think that you can cover that well in a single ReturnnCommonDecoder class. I think it makes more sense to have separate HybridDecoder, FactoredHybridDecoder, CtcDecoder, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants