Skip to content

Commit

Permalink
cleaner home screen
Browse files Browse the repository at this point in the history
  • Loading branch information
jonah-ramponi committed Mar 30, 2024
1 parent 80700bd commit c3f0029
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 11 deletions.
14 changes: 13 additions & 1 deletion config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,18 @@ pygmentsstyle: "monokai"
pygmentscodefences: true
pygmentscodefencesguesssyntax: true
params:
mode: "toggle"
math: true
mathjax: true
katex: true
katex: true
paginate=3 # articles per page

[[menu.main]]
name = "Home"
url = "/"
weight = 1

[[menu.main]]
name = "About"
url = "/about"
weight = 3
23 changes: 13 additions & 10 deletions content/posts/intro_to_attention.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Intro to Attention
description: A brief introduction to attention in the transformer architecture.
date: 2024-03-22
date: 2024-03-30
tldr: A brief introduction to attention in the transformer architecture.
draft: false
tags: [attention]
Expand Down Expand Up @@ -53,18 +53,20 @@ This was tokenized
Then embedded

$$
\begin{pmatrix} -0.415 \\\\ -0.514 \\\\ 0.569 \\\\ \vdots \\\\ -0.257 \\\\ 0.571 \\\\ \end{pmatrix}
\begin{pmatrix} -0.130 \\\\ -0.464 \\\\ 0.23 \\\\ \vdots \\\\ -0.154 \\\\ 0.192 \\\\ \end{pmatrix}
\begin{pmatrix} -0.415 \\\\ \vdots \\\\ 0.571 \\\\ \end{pmatrix}
\begin{pmatrix} -0.130 \\\\ \vdots \\\\ 0.192 \\\\ \end{pmatrix}
, \dots ,
\begin{pmatrix} 0.127 \\\\ 0.453 \\\\ 0.110 \\\\ \vdots \\\\ -0.155 \\\\ 0.484 \\\\ \end{pmatrix}
\begin{pmatrix} 0.127 \\\\ \vdots \\\\ 0.484 \\\\ \end{pmatrix}
$$

and finally positionally encoded

\begin{equation}
\text{positional encoder}\Bigg(\begin{pmatrix} -0.415 \\\\ -0.514 \\\\ \vdots \\\\ -0.257 \\\\ 0.571 \\\\ \end{pmatrix}\Bigg) =
\begin{pmatrix} -0.424 \\\\ -0.574 \\\\ \vdots \\\\ -0.235 \\\\ 0.534 \\\\ \end{pmatrix}
\end{equation}
$$
\begin{pmatrix} -0.424 \\\\ \vdots \\\\ 0.534 \\\\ \end{pmatrix}
\begin{pmatrix} 0.110 \\\\ \vdots \\\\ 0.212 \\\\ \end{pmatrix}
, \dots ,
\begin{pmatrix} 0.070 \\\\ \vdots \\\\ 0.324 \\\\ \end{pmatrix}
$$

We're now very close to being able to introduce attention. One last thing remains, at this point we will transform the output of our positional encoding to a matrix $M$ as follows

Expand Down Expand Up @@ -98,12 +100,13 @@ At a high level, self-attention aims to evaluate the importance of each element

for some mapping $w_{ij}$. The challenge is in figuring out how we should define our mapping $w_{ij}$. Let's look at the first way $w_{ij}$ was defined, introduced in [Attention is All You Need](https://arxiv.org/pdf/1706.03762.pdf).

### Scaled Dot Product Self Attention. To compute scaled dot product self attention, we will use the matrix $M$ with rows corresponding to the positionally encoded vectors. $M$ has dimensions $(n \times d_{\text{model}})$.
### Scaled Dot Product Self Attention.
To compute scaled dot product self attention, we will use the matrix $M$ with rows corresponding to the positionally encoded vectors. $M$ has dimensions $(n \times d_{\text{model}})$.

We begin by producing query, key and value matrices, analogous to how a search engine maps a user query to relevant items in its database. We will make 3 copies of our matrix $M$. These become the matrices $Q, K$ and $V$. Each of these has dimension $(n \times d_{\text{model}})$. We let $d_k$ denote the dimensions of the keys, which in this case is $d_{\text{model}}$. We are ready to define attention as

\begin{equation}
\text{attention}(Q,K,V) = \text{softmax} \Big( \frac{Q K^T}{\sqrt{d_k}} \Big) V.
\text{attention}(Q,K,V) = \mathrm{softmax} \Big( \frac{Q K^T}{\sqrt{d_k}} \Big) V.
\end{equation}

```python
Expand Down

0 comments on commit c3f0029

Please sign in to comment.