You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the forward() for MultiHeadAttention class in assignment3/cs231n/transformer_layers.py In the argument list provided by the setup code:attn_mask: Array of shape (T, S) where mask[i,j] == 0 should be attn_mask: Array of shape (S, T) where mask[i,j] == 0 If attn_mask is of (T, S) shape, then it needs to be transposed because the product of the query and key matrix is of the shape (batch_size, num_heads, S, T) so the code for masking should be query_key_product.masked_fill(torch.transpose(attn_mask, 0,1) == 0, -np.inf)) which doesn't give the output value provided by expected_masked_self_attn_output. The output only matches the provided output if people don't transpose attn_mask which is wrong
The text was updated successfully, but these errors were encountered:
In the
forward()
forMultiHeadAttention
class inassignment3/cs231n/transformer_layers.py
In the argument list provided by the setup code:
attn_mask: Array of shape (T, S) where mask[i,j] == 0
should beattn_mask: Array of shape (S, T) where mask[i,j] == 0
If attn_mask is of
(T, S)
shape, then it needs to be transposed because the product of the query and key matrix is of the shape(batch_size, num_heads, S, T)
so the code for masking should be
query_key_product.masked_fill(torch.transpose(attn_mask, 0,1) == 0, -np.inf))
which doesn't give the output value provided byexpected_masked_self_attn_output
. The output only matches the provided output if people don't transposeattn_mask
which is wrongThe text was updated successfully, but these errors were encountered: