From d2549d942187e3da82a0cdfd5582ac68ba6879a8 Mon Sep 17 00:00:00 2001
From: Jonah Ramponi <jonahramponi2@gmail.com>
Date: Wed, 3 Apr 2024 21:10:05 +0100
Subject: [PATCH] Update intro_to_attention.md

---
 content/posts/intro_to_attention.md | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/content/posts/intro_to_attention.md b/content/posts/intro_to_attention.md
index bf01018..b7c6102 100644
--- a/content/posts/intro_to_attention.md
+++ b/content/posts/intro_to_attention.md
@@ -198,7 +198,12 @@ The overall output of the process is then simply
 Concat() simply concatenates our output matrices. The output matrix of size $(n \times d_v)$ for each head is simply our matrices stacked on top of one another like so
 
 \begin{equation*}
-    \text{Concat}(\text{head}_1, \dots, \text{head}_h) = 
+    \text{Concat}(\text{head}_1, \dots, \text{head}_h) = \begin{pmatrix}
+head_{1_{11}} & \dots & head_{1_{1d_v}} & \dots & head_{H_{11}} & \dots & head_{H_{1d_v}} \\ 
+head_{1_{21}} & \dots & head_{1_{2d_v}} & \dots & head_{H_{21}} & \dots & head_{H_{2d_v}} \\ 
+\vdots & \ddots & \vdots & \dots & \vdots & \ddots & \vdots \\ 
+head_{1_{n1}} & \dots & head_{1_{nd_v}} & \dots & head_{H_{n1}} & \dots & head_{H_{nd_v}} \\ 
+    \end{pmatrix}
 \end{equation*}
 
 This output has dimension $(n \times H d_v)$. We still have $n$ rows, however now we have $h$ different representations of $d_v$. Our output, $W^O$, is another trainable weight matrix which has dimensions $W^O = (Hd_v \times d_{\text{model}})$. Therefore, the multiplication of Concat $ (head_1, \dots, head_H)$ and $W^O$ results in a matrix with dimension $(n \times d_{\text{model}})$.
\ No newline at end of file