Self Attention Implementation issue #382

pverma3 · 2024-10-04T06:25:23Z

pverma3
Oct 4, 2024

Is there a bug in the implementation of SelfAttention_v1. (Ch3, section 3.4.2, Page 71)
in the line where keys, queries, values are computed by multiplying x and W_key etc.?

def forward(self, x):
        keys = x @ self.W_key
        queries = x @ self.W_query
        values = x @ self.W_value
        
        attn_scores = queries @ keys.T # omega
        attn_weights = torch.softmax(
            attn_scores / keys.shape[-1]**0.5, dim=-1
        )

        context_vec = attn_weights @ values
        return context_vec

I am getting the error
RuntimeError: mat1 and mat2 shapes cannot be multiplied (6x3 and 6x3)
on the line where it is "keys = x @ self.W_key"

I am able to get the code working by transposing the W_key matrix
i.e.
keys = x @ self.W_key.T

Am I doing something wrong or is there a typo in the original code and W_key should indeed be transposed?

PS: I am running the above code on Windows laptop (not sure if it matters).

Answered by rasbt

Oct 4, 2024

Thanks for sharing, I can see the issue now 😊

In your code,

d_in = inputs.shape[0]

should be

d_in = inputs.shape[1]

View full answer

rasbt · 2024-10-04T11:08:09Z

rasbt
Oct 4, 2024
Maintainer

Hi there,

thanks for the comment. Based on trying it once more on my computer, I think there might be a typo somewhere in your code. Could you share the full code you are trying to run for Chapter 3 (perhaps the easiest way would be via a Google Colab notebook) so I can take a look.

I suspect there is maybe something that went wrong in the __init__ part:

import torch.nn as nn

class SelfAttention_v1(nn.Module):

    def __init__(self, d_in, d_out):
        super().__init__()
        self.W_query = nn.Parameter(torch.rand(d_in, d_out))
        self.W_key   = nn.Parameter(torch.rand(d_in, d_out))
        self.W_value = nn.Parameter(torch.rand(d_in, d_out))

Because if you had to do keys = x @ self.W_key.T then the following:

        queries = x @ self.W_query
        values = x @ self.W_value

also shouldn't have worked.

In any case, could you share your class SelfAttention_v1(nn.Module) code so I can take a look?

4 replies

pverma3 Oct 4, 2024
Author

Thanks for the response @rasbt! You are right, I need to make modifications to query and value matrices too in order to make it work

queries = x @ self.W_query  --> x @ self.W_query.T
values = x @ self.W_value     --> x @self.W_value.T

Here is the Google colab notebook link with the original implementation showing the error:
https://colab.research.google.com/drive/1tVcSFzwPRCPZYedUB-kd4fgoF-t9xK8o?usp=sharing

rasbt Oct 4, 2024
Maintainer

Thanks for sharing, I can see the issue now 😊

In your code,

d_in = inputs.shape[0]

should be

d_in = inputs.shape[1]

Answer selected by rasbt

pverma3 Oct 4, 2024
Author

thanks a lot @rasbt for your quick response, really appreciate you taking time to look into my silly issue!

Btw, I am loving the book! The explanations provided and code implementations has brought so much more clarity into understanding the inner workings of attention and transformer. Thank you once again for this excellent book :)

rasbt Oct 4, 2024
Maintainer

@pverma3 No worries, and I hope you enjoy the rest of the book (I think it becomes progressively more fun with each chapter -- at least I felt like this when writing it, haha)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self Attention Implementation issue #382

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Self Attention Implementation issue #382

pverma3 Oct 4, 2024

Replies: 1 comment · 4 replies

rasbt Oct 4, 2024 Maintainer

pverma3 Oct 4, 2024 Author

rasbt Oct 4, 2024 Maintainer

pverma3 Oct 4, 2024 Author

rasbt Oct 4, 2024 Maintainer

pverma3
Oct 4, 2024

Replies: 1 comment 4 replies

rasbt
Oct 4, 2024
Maintainer

pverma3 Oct 4, 2024
Author

rasbt Oct 4, 2024
Maintainer

pverma3 Oct 4, 2024
Author

rasbt Oct 4, 2024
Maintainer