Skip to content

Commit

Permalink
send it
Browse files Browse the repository at this point in the history
  • Loading branch information
jonah-ramponi committed Mar 30, 2024
1 parent 8e3b716 commit eb78063
Show file tree
Hide file tree
Showing 4 changed files with 5 additions and 4 deletions.
2 changes: 1 addition & 1 deletion content/posts/flash_attention.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Flash Attention
description: Reduce the memory usage used to compute exact attention.
date: 2024-03-26
date: 2024-03-23
tldr: Reduce the memory usage used to compute exact attention.
draft: false
tags: [attention, inference]
Expand Down
3 changes: 2 additions & 1 deletion content/posts/mqa_gqa.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ For each head in a given group, we calculate attention outputs as

The query matrices will be shared by all groups under a given head, and the key and value matrices will be used for all attention calculations within a given group.

**Conversions from Multi Head Attention.** A natural question might be how one could take a model which uses multi-head attention and convert it to model using multi query attention or grouped query attention. To convert to multi query attention, we want to find a single representative matrix for both $K$ and $V$ from our set of $H$ different heads. We achieve this via mean pooling. For instance for $K$,
#### Conversions from Multi Head Attention.
A natural question might be how one could take a model which uses multi-head attention and convert it to model using multi query attention or grouped query attention. To convert to multi query attention, we want to find a single representative matrix for both $K$ and $V$ from our set of $H$ different heads. We achieve this via mean pooling. For instance for $K$,

\begin{equation}
\text{mean pooling}(K_1,\dots,K_h) \rightarrow K'.
Expand Down
2 changes: 1 addition & 1 deletion content/posts/sliding_window_attention.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Sliding Window Attention
description: Altering the tokens to which a token in the input sequence attends.
date: 2024-03-22
date: 2024-03-27
tldr: Altering the tokens to which a token in the input sequence attends.
draft: false
tags: [attention]
Expand Down
2 changes: 1 addition & 1 deletion content/posts/sparse_attention.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Sparse Attention
description: Reducing the number of calculations to compute attention.
date: 2024-03-22
date: 2024-03-25
tldr: Reducing the number of calculations to compute attention.
draft: false
tags: [attention]
Expand Down

0 comments on commit eb78063

Please sign in to comment.