-
Is there a bug in the implementation of SelfAttention_v1. (Ch3, section 3.4.2, Page 71)
I am getting the error I am able to get the code working by transposing the W_key matrix Am I doing something wrong or is there a typo in the original code and W_key should indeed be transposed? PS: I am running the above code on Windows laptop (not sure if it matters). |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Hi there, thanks for the comment. Based on trying it once more on my computer, I think there might be a typo somewhere in your code. Could you share the full code you are trying to run for Chapter 3 (perhaps the easiest way would be via a Google Colab notebook) so I can take a look. I suspect there is maybe something that went wrong in the import torch.nn as nn
class SelfAttention_v1(nn.Module):
def __init__(self, d_in, d_out):
super().__init__()
self.W_query = nn.Parameter(torch.rand(d_in, d_out))
self.W_key = nn.Parameter(torch.rand(d_in, d_out))
self.W_value = nn.Parameter(torch.rand(d_in, d_out)) Because if you had to do
also shouldn't have worked. In any case, could you share your |
Beta Was this translation helpful? Give feedback.
Thanks for sharing, I can see the issue now 😊
In your code,
should be