Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about coverage mechanism implementation #157

Open
iamxpy opened this issue Apr 14, 2020 · 0 comments
Open

Question about coverage mechanism implementation #157

iamxpy opened this issue Apr 14, 2020 · 0 comments

Comments

@iamxpy
Copy link

iamxpy commented Apr 14, 2020

I am trying to figure out the implementation of coverage mechanism, and after debug for a while, I still cannot understand why is the procedure of producing coverage vector in decode mode NOT the same as that in training/eval mode.

Related code is here: this line

Note that this attention decoder passes each decoder input through a linear layer with the previous step's context vector to get a modified version of the input. If initial_state_attention is False, on the first decoder step the "previous context vector" is just a zero vector. If initial_state_attention is True, we use initial_state to (re)calculate the previous step's context vector. We set this to False for train/eval mode (because we call attention_decoder once for all decoder steps) and True for decode mode (because we call attention_decoder once for each decoder step).

IMHO, the training and decode procedures would mismatch to some extend in such an implementation (Please correct me if I am wrong).

For example:

Let H be all encoder hidden states (a list of tensors), then,

In training/eval mode, every decode step use attention network only once:

Input: H, current_decoder_hidden_state, previous_coverage(None for the first decode step)

Output: next coverage, next context and attention weights( i.e. attn_dist in the code).

In decode mode, every step will apply attention mechanism twice:

(1) The first time:

Input: H, previous_decoder_hidden_state, previous_coverage (0s for the first decode step)

Output: modified previous context and next coverage (discard attention weights here)

(2) The second time:

Input: H, current_decoder_hidden_state, next coverage

Output: next context, attention weights (DO NOT update next coverage here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant