-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About multi-head attention in attention is all you need, thanks. #19
Comments
As far as I understand, your doubt is that why Q, K, V is not going through You can see a similar implementation in PyTorch source code as well. [2] Anyone else reading this, please correct me if I am wrong or if there are some others benefits/reasons of using this implementation. EDIT:
For the most faithful implementations of research papers, you should also check out labml.ai annotated pytorch implementations repository. [3] [1] https://d2l.ai/chapter_attention-mechanisms-and-transformers/multihead-attention.html |
Thank you for your comment, but it doesn't address my question. For instance, consider a sequence, and we need to produce its embedding matrix, named X. Then, it is sent to every head and multiplied by W_q, W_k, and W_v, respectively. Now, each head generates its corresponding Q, K, and V. However, before entering each linear layer in every head, the paper's multi-head attention illustration shows Q, K, and V instead of X, X, and X correspondingly. |
Hello, author. I am sincerely that you can answer me when you saw.
I urgently want to realize why there are Q, K, V as input in multi-head attention and then feed them into the three linear of each head respectively? Does the three linear represent w_q, w_k and w_v of each head? If so, the embedding matrix needs to be convert to Q, K and V and then be convert to Q_i, K_i and V_i passing by w_q, w_k and w_v of certain head. The embedding matrix will go through two transformations.
I have seen several realizations including yours and you all directly feed the embedding matrix into the three linear of each head.
How is it to achieve? thanks for your help.
The text was updated successfully, but these errors were encountered: