2021 assignment 3 Q2: the specified dimensions for attn_mask argument is wrong 

In the `forward()` for `MultiHeadAttention` class in `assignment3/cs231n/transformer_layers.py` <br /> In the argument list provided by the setup code:`attn_mask: Array of shape (T, S) where mask[i,j] == 0` should be `attn_mask: Array of shape (S, T) where mask[i,j] == 0`  <br /> If attn_mask is of `(T, S)` shape, then it needs to be transposed because the product of the query and key matrix is of the shape `(batch_size, num_heads, S, T)` <br /> so the code for masking should be  `query_key_product.masked_fill(torch.transpose(attn_mask, 0,1) == 0, -np.inf))` which doesn't give the output value provided by `expected_masked_self_attn_output`. The output only matches the provided output if people don't transpose `attn_mask` which is wrong 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

2021 assignment 3 Q2: the specified dimensions for attn_mask argument is wrong #279

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

2021 assignment 3 Q2: the specified dimensions for attn_mask argument is wrong #279

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions