-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conformer frontend should fix dimensions, be more standard #219
Comments
Or maybe sth in between. I think our pooling is fine? Note that we should change the Maybe we need some experiments first? |
I noticed, in ESPnet, when self.xscale = math.sqrt(self.d_model)
...
x = x * self.xscale |
This is probably because such scale is also there for the word embeddings in the original Transformer. I tried to find out on the motivation of this. I found this CV question. One answer states that it is to get the embeddings into a similar range as the positional encoding, or actually larger than the pos enc, which is important when you add them together. However, here in this case, we actually do not add them, so I wonder if the scale is really necessary or helpful, or maybe even hurtful. I guess we need to do an experiment. |
As we discussed, it should be configurable to support both the RETURNN standard cases and also the ESPnet case (at least mostly, maybe except of the xscale). For now, we would not put defaults, as it's not clear yet which are the best. |
So what else is missing after #219? Striding is one thing. What else? |
Ok I added striding. But I'm thinking about refactoring it a bit more. Specifically, current problems:
|
I changed |
I renamed the things as discussed, and changed the option to a single |
In ESPnet, default |
The defaults we use for
ConformerConvSubsample
are wrong.Maybe also the structure of the layer is wrong.
We should follow more standard code, e.g.:
https://github.com/espnet/espnet/blob/4138010fb66ad27a43e8bee48a4932829a0847ae/espnet/nets/pytorch_backend/transformer/subsampling.py#L162
https://github.com/espnet/espnet/blob/4138010fb66ad27a43e8bee48a4932829a0847ae/espnet2/asr/encoder/conformer_encoder.py#L164
Also see relative positional encoding, #132.
The text was updated successfully, but these errors were encountered: