You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think in the forward pass of the TransformerEncoder a padding mask for the attention should be used.
The padding tokens need to be excluded when calculating the attention weights. This is related to Chapter 12.2.1.
See cell 33 here. See also the PyTorch docs for refernece.
It should be changed into something like this (the src_key_padding_mask needs to be True for the values that need to be masked out):
defforward(self, input):
ifself.padding_idxisnotNone:
mask=input!=self.padding_idxsrc_key_padding_mask=torch.logical_not(mask)
else:
mask=input==inputsrc_key_padding_mask=Nonex=self.embd(input) #(B, T, D)x=self.position(x) #(B, T, D)#Because the resut of our code is (B, T, D), but transformers #take input as (T, B, D), we will have to permute the order #of the dimensions before and after x=self.transformer(x.permute(1,0,2), src_key_padding_mask=src_key_padding_mask) #(T, B, D)x=x.permute(1,0,2) #(B, T, D)#average over timecontext=x.sum(dim=1)/mask.sum(dim=1).unsqueeze(1)
returnself.pred(self.attn(x, context, mask=mask))```
The text was updated successfully, but these errors were encountered:
I think in the forward pass of the TransformerEncoder a padding mask for the attention should be used.
The padding tokens need to be excluded when calculating the attention weights. This is related to Chapter 12.2.1.
See cell 33 here. See also the PyTorch docs for refernece.
It should be changed into something like this (the src_key_padding_mask needs to be
True
for the values that need to be masked out):The text was updated successfully, but these errors were encountered: