diff --git a/posts/sparse_attention/index.html b/posts/sparse_attention/index.html
index 4d700b8..c765350 100644
--- a/posts/sparse_attention/index.html
+++ b/posts/sparse_attention/index.html
@@ -92,11 +92,8 @@ <h1 class="title">Sparse Attention</h1>
 \text{attention}(Q,K,V, S_i) = \text{softmax}\Big( \frac{(Q_{S_i}) K^T_{S_i}}{\sqrt{d_k}} \Big) V_{S_i}.
 \end{equation*}</p>
 <p>Here, we have defined</p>
-<p>\begin{align*}
-Q_{S_i} &amp;= (W_q x_j)<em>{j \text{ in } S_i}, \\
-K</em>{S_i} &amp;= (W_k x_j)<em>{j \text{ in } S_i}, \\
-V</em>{S_i} &amp;= (W_v x_j)_{j \text{ in } S_i}.
-\end{align*}</p>
+<p>$ Q_{S_i} = (W_q x_j)_{j \text{ in } S_i}$</p>
+<p>$$ Q_{S_i} = (W_q x_j), K_{S_i} = (W_k x_j), V_{S_i} = (W_v x_j) \text{ for } j \in S_i $$</p>
 <p>So how do we define the set of connectivity patterns $S$? Formally, we let $S_i = A_i^{h}$ for head $h$ where $A_i^{h} \subset {j : j \leq i}$. It is still no clearer how we pick which indices we should take for a given $S_i$. The original authors consider two key criteria initially:</p>
 <h4 id="criteria-1">Criteria 1</h4>
 <p>We should pick $|A_i^h| \propto n^{1/H}$ where $H$ is our total number of heads. This choice is efficient as it ensures the size of the connectivity set scales well with $H$.</p>