send it

jonah-ramponi · Mar 30, 2024 · eb78063 · eb78063
1 parent 8e3b716
commit eb78063
Show file tree

Hide file tree

Showing 4 changed files with 5 additions and 4 deletions.
diff --git a/content/posts/flash_attention.md b/content/posts/flash_attention.md
@@ -1,7 +1,7 @@
 ---
 title: Flash Attention
 description: Reduce the memory usage used to compute exact attention.
-date: 2024-03-26
+date: 2024-03-23
 tldr: Reduce the memory usage used to compute exact attention.
 draft: false
 tags: [attention, inference] 

diff --git a/content/posts/mqa_gqa.md b/content/posts/mqa_gqa.md
@@ -33,7 +33,8 @@ For each  head in a given group, we calculate attention outputs as
 
 The query matrices will be shared by all groups under a given head, and the key and value matrices will be used for all attention calculations within a given group. 
 
-**Conversions from Multi Head Attention.** A natural question might be how one could take a model which uses multi-head attention and convert it to model using multi query attention or grouped query attention. To convert to multi query attention, we want to find a single representative matrix for both $K$ and $V$ from our set of $H$ different heads. We achieve this via mean pooling. For instance for $K$, 
+#### Conversions from Multi Head Attention. 
+A natural question might be how one could take a model which uses multi-head attention and convert it to model using multi query attention or grouped query attention. To convert to multi query attention, we want to find a single representative matrix for both $K$ and $V$ from our set of $H$ different heads. We achieve this via mean pooling. For instance for $K$, 
 
 \begin{equation}
     \text{mean pooling}(K_1,\dots,K_h) \rightarrow K'.

diff --git a/content/posts/sliding_window_attention.md b/content/posts/sliding_window_attention.md
@@ -1,7 +1,7 @@
 ---
 title: Sliding Window Attention
 description: Altering the tokens to which a token in the input sequence attends.
-date: 2024-03-22
+date: 2024-03-27
 tldr: Altering the tokens to which a token in the input sequence attends.
 draft: false
 tags: [attention] 

diff --git a/content/posts/sparse_attention.md b/content/posts/sparse_attention.md
@@ -1,7 +1,7 @@
 ---
 title: Sparse Attention
 description: Reducing the number of calculations to compute attention.
-date: 2024-03-22
+date: 2024-03-25
 tldr: Reducing the number of calculations to compute attention.
 draft: false
 tags: [attention]