Skip to content

2021 assignment 3 Q2: the specified dimensions for attn_mask argument is wrong  #279

Open
@manuka2

Description

@manuka2

In the forward() for MultiHeadAttention class in assignment3/cs231n/transformer_layers.py
In the argument list provided by the setup code:attn_mask: Array of shape (T, S) where mask[i,j] == 0 should be attn_mask: Array of shape (S, T) where mask[i,j] == 0
If attn_mask is of (T, S) shape, then it needs to be transposed because the product of the query and key matrix is of the shape (batch_size, num_heads, S, T)
so the code for masking should be query_key_product.masked_fill(torch.transpose(attn_mask, 0,1) == 0, -np.inf)) which doesn't give the output value provided by expected_masked_self_attn_output. The output only matches the provided output if people don't transpose attn_mask which is wrong

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions