diff --git a/config.yml b/config.yml
index 8054715..a93f09c 100644
--- a/config.yml
+++ b/config.yml
@@ -21,4 +21,7 @@ menu:
 
    - name: "About"
      url: "/about"
-     weight: 3
\ No newline at end of file
+     weight: 3
+
+disableKinds:
+  - "404"
\ No newline at end of file
diff --git a/public/404.html b/public/404.html
index d06def7..b77b272 100644
--- a/public/404.html
+++ b/public/404.html
@@ -1,209 +1,211 @@
 <!DOCTYPE html>
 <html lang="en" dir="auto">
 
-<head><meta charset="utf-8">
-<meta http-equiv="X-UA-Compatible" content="IE=edge">
-<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
-<meta name="robots" content="index, follow">
-<title>404 Page not found | Jonah&#39;s ML Notes</title>
-<meta name="keywords" content="">
-<meta name="description" content="">
-<meta name="author" content="">
-<link rel="canonical" href="http://jonah-ramponi.github.io/jonahsdemo/404.html">
-<link crossorigin="anonymous" href="/jonahsdemo/assets/css/stylesheet.4599eadb9eb2ad3d0a8d6827b41a8fda8f2f4af226b63466c09c5fddbc8706b7.css" integrity="sha256-RZnq256yrT0KjWgntBqP2o8vSvImtjRmwJxf3byHBrc=" rel="preload stylesheet" as="style">
-<link rel="icon" href="http://jonah-ramponi.github.io/jonahsdemo/favicon.ico">
-<link rel="icon" type="image/png" sizes="16x16" href="http://jonah-ramponi.github.io/jonahsdemo/favicon-16x16.png">
-<link rel="icon" type="image/png" sizes="32x32" href="http://jonah-ramponi.github.io/jonahsdemo/favicon-32x32.png">
-<link rel="apple-touch-icon" href="http://jonah-ramponi.github.io/jonahsdemo/apple-touch-icon.png">
-<link rel="mask-icon" href="http://jonah-ramponi.github.io/jonahsdemo/safari-pinned-tab.svg">
-<meta name="theme-color" content="#2e2e33">
-<meta name="msapplication-TileColor" content="#2e2e33">
-<link rel="alternate" hreflang="en" href="http://jonah-ramponi.github.io/jonahsdemo/404.html">
-<noscript>
-    <style>
-        #theme-toggle,
-        .top-link {
-            display: none;
-        }
-
-    </style>
-    <style>
-        @media (prefers-color-scheme: dark) {
-            :root {
-                --theme: rgb(29, 30, 32);
-                --entry: rgb(46, 46, 51);
-                --primary: rgb(218, 218, 219);
-                --secondary: rgb(155, 156, 157);
-                --tertiary: rgb(65, 66, 68);
-                --content: rgb(196, 196, 197);
-                --code-block-bg: rgb(46, 46, 51);
-                --code-bg: rgb(55, 56, 62);
-                --border: rgb(51, 51, 51);
-            }
-
-            .list {
-                background: var(--theme);
+<head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
+    <meta name="robots" content="index, follow">
+    <title>404 Page not found | Jonah&#39;s ML Notes</title>
+    <meta name="keywords" content="">
+    <meta name="description" content="">
+    <meta name="author" content="">
+    <link rel="canonical" href="http://jonah-ramponi.github.io/jonahsdemo/404.html">
+    <link crossorigin="anonymous"
+        href="/jonahsdemo/assets/css/stylesheet.4599eadb9eb2ad3d0a8d6827b41a8fda8f2f4af226b63466c09c5fddbc8706b7.css"
+        integrity="sha256-RZnq256yrT0KjWgntBqP2o8vSvImtjRmwJxf3byHBrc=" rel="preload stylesheet" as="style">
+    <link rel="icon" href="http://jonah-ramponi.github.io/jonahsdemo/favicon.ico">
+    <link rel="icon" type="image/png" sizes="16x16" href="http://jonah-ramponi.github.io/jonahsdemo/favicon-16x16.png">
+    <link rel="icon" type="image/png" sizes="32x32" href="http://jonah-ramponi.github.io/jonahsdemo/favicon-32x32.png">
+    <link rel="apple-touch-icon" href="http://jonah-ramponi.github.io/jonahsdemo/apple-touch-icon.png">
+    <link rel="mask-icon" href="http://jonah-ramponi.github.io/jonahsdemo/safari-pinned-tab.svg">
+    <meta name="theme-color" content="#2e2e33">
+    <meta name="msapplication-TileColor" content="#2e2e33">
+    <link rel="alternate" hreflang="en" href="http://jonah-ramponi.github.io/jonahsdemo/404.html">
+    <noscript>
+        <style>
+            #theme-toggle,
+            .top-link {
+                display: none;
             }
-
-            .list:not(.dark)::-webkit-scrollbar-track {
-                background: 0 0;
-            }
-
-            .list:not(.dark)::-webkit-scrollbar-thumb {
-                border-color: var(--theme);
+        </style>
+        <style>
+            @media (prefers-color-scheme: dark) {
+                :root {
+                    --theme: rgb(29, 30, 32);
+                    --entry: rgb(46, 46, 51);
+                    --primary: rgb(218, 218, 219);
+                    --secondary: rgb(155, 156, 157);
+                    --tertiary: rgb(65, 66, 68);
+                    --content: rgb(196, 196, 197);
+                    --code-block-bg: rgb(46, 46, 51);
+                    --code-bg: rgb(55, 56, 62);
+                    --border: rgb(51, 51, 51);
+                }
+
+                .list {
+                    background: var(--theme);
+                }
+
+                .list:not(.dark)::-webkit-scrollbar-track {
+                    background: 0 0;
+                }
+
+                .list:not(.dark)::-webkit-scrollbar-thumb {
+                    border-color: var(--theme);
+                }
             }
-        }
-
-    </style>
-</noscript>
-<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css"
-    integrity="sha384-MlJdn/WNKDGXveldHDdyRP1R4CTHr3FeuDNfhsLPYrq2t0UBkUdK2jyTnXPEK1NQ" crossorigin="anonymous">
-
-<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"
-    integrity="sha384-VQ8d8WVFw0yHhCk5E8I86oOhv48xLpnDZx5T9GogA/Y84DcCKWXDmSDfn13bzFZY"
-    crossorigin="anonymous"></script>
-
-<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js"
-    integrity="sha384-+XBljXPPiv+OzfbB3cVmLHf4hdUFHlWNZN5spNQ7rmHTXpd7WvJum6fIACpNNfIR" crossorigin="anonymous"
-    onload="renderMathInElement(document.body);"></script>
-
-
-<script>
-    document.addEventListener("DOMContentLoaded", function () {
-        renderMathInElement(document.body, {
-            delimiters: [
-                { left: "$$", right: "$$", display: true },
-                { left: "$", right: "$", display: false }
-            ]
+        </style>
+    </noscript>
+    <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css"
+        integrity="sha384-MlJdn/WNKDGXveldHDdyRP1R4CTHr3FeuDNfhsLPYrq2t0UBkUdK2jyTnXPEK1NQ" crossorigin="anonymous">
+
+    <script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"
+        integrity="sha384-VQ8d8WVFw0yHhCk5E8I86oOhv48xLpnDZx5T9GogA/Y84DcCKWXDmSDfn13bzFZY"
+        crossorigin="anonymous"></script>
+
+    <script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js"
+        integrity="sha384-+XBljXPPiv+OzfbB3cVmLHf4hdUFHlWNZN5spNQ7rmHTXpd7WvJum6fIACpNNfIR" crossorigin="anonymous"
+        onload="renderMathInElement(document.body);"></script>
+
+
+    <script>
+        document.addEventListener("DOMContentLoaded", function () {
+            renderMathInElement(document.body, {
+                delimiters: [
+                    { left: "$$", right: "$$", display: true },
+                    { left: "$", right: "$", display: false }
+                ]
+            });
         });
-    });
-</script>
-<meta property="og:title" content="404 Page not found" />
-<meta property="og:description" content="" />
-<meta property="og:type" content="website" />
-<meta property="og:url" content="http://jonah-ramponi.github.io/jonahsdemo/404.html" />
+    </script>
+    <meta property="og:title" content="404 Page not found" />
+    <meta property="og:description" content="" />
+    <meta property="og:type" content="website" />
+    <meta property="og:url" content="http://jonah-ramponi.github.io/jonahsdemo/404.html" />
 
-<meta name="twitter:card" content="summary"/>
-<meta name="twitter:title" content="404 Page not found"/>
-<meta name="twitter:description" content=""/>
+    <meta name="twitter:card" content="summary" />
+    <meta name="twitter:title" content="404 Page not found" />
+    <meta name="twitter:description" content="" />
 
 </head>
 
 <body class="list" id="top">
-<script>
-    if (localStorage.getItem("pref-theme") === "dark") {
-        document.body.classList.add('dark');
-    } else if (localStorage.getItem("pref-theme") === "light") {
-        document.body.classList.remove('dark')
-    } else if (window.matchMedia('(prefers-color-scheme: dark)').matches) {
-        document.body.classList.add('dark');
-    }
-
-</script>
-
-<header class="header">
-    <nav class="nav">
-        <div class="logo">
-            <a href="http://jonah-ramponi.github.io/jonahsdemo/" accesskey="h" title="Jonah&#39;s ML Notes (Alt + H)">Jonah&#39;s ML Notes</a>
-            <div class="logo-switches">
-                <button id="theme-toggle" accesskey="t" title="(Alt + T)">
-                    <svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24"
-                        fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round"
-                        stroke-linejoin="round">
-                        <path d="M21 12.79A9 9 0 1 1 11.21 3 7 7 0 0 0 21 12.79z"></path>
-                    </svg>
-                    <svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24"
-                        fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round"
-                        stroke-linejoin="round">
-                        <circle cx="12" cy="12" r="5"></circle>
-                        <line x1="12" y1="1" x2="12" y2="3"></line>
-                        <line x1="12" y1="21" x2="12" y2="23"></line>
-                        <line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
-                        <line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
-                        <line x1="1" y1="12" x2="3" y2="12"></line>
-                        <line x1="21" y1="12" x2="23" y2="12"></line>
-                        <line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
-                        <line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
-                    </svg>
-                </button>
+    <script>
+        if (localStorage.getItem("pref-theme") === "dark") {
+            document.body.classList.add('dark');
+        } else if (localStorage.getItem("pref-theme") === "light") {
+            document.body.classList.remove('dark')
+        } else if (window.matchMedia('(prefers-color-scheme: dark)').matches) {
+            document.body.classList.add('dark');
+        }
+
+    </script>
+
+    <header class="header">
+        <nav class="nav">
+            <div class="logo">
+                <a href="http://jonah-ramponi.github.io/jonahsdemo/" accesskey="h"
+                    title="Jonah&#39;s ML Notes (Alt + H)">Jonah&#39;s ML Notes</a>
+                <div class="logo-switches">
+                    <button id="theme-toggle" accesskey="t" title="(Alt + T)">
+                        <svg id="moon" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24"
+                            fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round"
+                            stroke-linejoin="round">
+                            <path d="M21 12.79A9 9 0 1 1 11.21 3 7 7 0 0 0 21 12.79z"></path>
+                        </svg>
+                        <svg id="sun" xmlns="http://www.w3.org/2000/svg" width="24" height="18" viewBox="0 0 24 24"
+                            fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round"
+                            stroke-linejoin="round">
+                            <circle cx="12" cy="12" r="5"></circle>
+                            <line x1="12" y1="1" x2="12" y2="3"></line>
+                            <line x1="12" y1="21" x2="12" y2="23"></line>
+                            <line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
+                            <line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
+                            <line x1="1" y1="12" x2="3" y2="12"></line>
+                            <line x1="21" y1="12" x2="23" y2="12"></line>
+                            <line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
+                            <line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
+                        </svg>
+                    </button>
+                </div>
             </div>
-        </div>
-        <ul id="menu">
-        </ul>
-    </nav>
-</header>
-<main class="main">
-<div class="not-found">404</div>
+            <ul id="menu">
+            </ul>
+        </nav>
+    </header>
+    <main class="main">
+        <div class="not-found">404</div>
     </main>
-    
-<footer class="footer">
-    <span>&copy; 2024 <a href="http://jonah-ramponi.github.io/jonahsdemo/">Jonah&#39;s ML Notes</a></span>
-    <span>
-        Powered by
-        <a href="https://gohugo.io/" rel="noopener noreferrer" target="_blank">Hugo</a> &
-        <a href="https://github.com/adityatelange/hugo-PaperMod/" rel="noopener" target="_blank">PaperMod</a>
-    </span>
-</footer>
-<a href="#top" aria-label="go to top" title="Go to Top (Alt + G)" class="top-link" id="top-link" accesskey="g">
-    <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 12 6" fill="currentColor">
-        <path d="M12 6H0l6-6z" />
-    </svg>
-</a>
-
-<script>
-    let menu = document.getElementById('menu')
-    if (menu) {
-        menu.scrollLeft = localStorage.getItem("menu-scroll-position");
-        menu.onscroll = function () {
-            localStorage.setItem("menu-scroll-position", menu.scrollLeft);
+
+    <footer class="footer">
+        <span>&copy; 2024 <a href="http://jonah-ramponi.github.io/jonahsdemo/">Jonah&#39;s ML Notes</a></span>
+        <span>
+            Powered by
+            <a href="https://gohugo.io/" rel="noopener noreferrer" target="_blank">Hugo</a> &
+            <a href="https://github.com/adityatelange/hugo-PaperMod/" rel="noopener" target="_blank">PaperMod</a>
+        </span>
+    </footer>
+    <a href="#top" aria-label="go to top" title="Go to Top (Alt + G)" class="top-link" id="top-link" accesskey="g">
+        <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 12 6" fill="currentColor">
+            <path d="M12 6H0l6-6z" />
+        </svg>
+    </a>
+
+    <script>
+        let menu = document.getElementById('menu')
+        if (menu) {
+            menu.scrollLeft = localStorage.getItem("menu-scroll-position");
+            menu.onscroll = function () {
+                localStorage.setItem("menu-scroll-position", menu.scrollLeft);
+            }
         }
-    }
-
-    document.querySelectorAll('a[href^="#"]').forEach(anchor => {
-        anchor.addEventListener("click", function (e) {
-            e.preventDefault();
-            var id = this.getAttribute("href").substr(1);
-            if (!window.matchMedia('(prefers-reduced-motion: reduce)').matches) {
-                document.querySelector(`[id='${decodeURIComponent(id)}']`).scrollIntoView({
-                    behavior: "smooth"
-                });
+
+        document.querySelectorAll('a[href^="#"]').forEach(anchor => {
+            anchor.addEventListener("click", function (e) {
+                e.preventDefault();
+                var id = this.getAttribute("href").substr(1);
+                if (!window.matchMedia('(prefers-reduced-motion: reduce)').matches) {
+                    document.querySelector(`[id='${decodeURIComponent(id)}']`).scrollIntoView({
+                        behavior: "smooth"
+                    });
+                } else {
+                    document.querySelector(`[id='${decodeURIComponent(id)}']`).scrollIntoView();
+                }
+                if (id === "top") {
+                    history.replaceState(null, null, " ");
+                } else {
+                    history.pushState(null, null, `#${id}`);
+                }
+            });
+        });
+
+    </script>
+    <script>
+        var mybutton = document.getElementById("top-link");
+        window.onscroll = function () {
+            if (document.body.scrollTop > 800 || document.documentElement.scrollTop > 800) {
+                mybutton.style.visibility = "visible";
+                mybutton.style.opacity = "1";
             } else {
-                document.querySelector(`[id='${decodeURIComponent(id)}']`).scrollIntoView();
+                mybutton.style.visibility = "hidden";
+                mybutton.style.opacity = "0";
             }
-            if (id === "top") {
-                history.replaceState(null, null, " ");
+        };
+
+    </script>
+    <script>
+        document.getElementById("theme-toggle").addEventListener("click", () => {
+            if (document.body.className.includes("dark")) {
+                document.body.classList.remove('dark');
+                localStorage.setItem("pref-theme", 'light');
             } else {
-                history.pushState(null, null, `#${id}`);
+                document.body.classList.add('dark');
+                localStorage.setItem("pref-theme", 'dark');
             }
-        });
-    });
-
-</script>
-<script>
-    var mybutton = document.getElementById("top-link");
-    window.onscroll = function () {
-        if (document.body.scrollTop > 800 || document.documentElement.scrollTop > 800) {
-            mybutton.style.visibility = "visible";
-            mybutton.style.opacity = "1";
-        } else {
-            mybutton.style.visibility = "hidden";
-            mybutton.style.opacity = "0";
-        }
-    };
-
-</script>
-<script>
-    document.getElementById("theme-toggle").addEventListener("click", () => {
-        if (document.body.className.includes("dark")) {
-            document.body.classList.remove('dark');
-            localStorage.setItem("pref-theme", 'light');
-        } else {
-            document.body.classList.add('dark');
-            localStorage.setItem("pref-theme", 'dark');
-        }
-    })
+        })
 
-</script>
+    </script>
 </body>
 
-</html>
+</html>
\ No newline at end of file
diff --git a/public/about/index.html b/public/about/index.html
new file mode 100644
index 0000000..4e996b4
--- /dev/null
+++ b/public/about/index.html
@@ -0,0 +1,115 @@
+<!DOCTYPE html>
+<html><head lang="en">
+	<meta charset="utf-8" />
+	<meta http-equiv="X-UA-Compatible" content="IE=edge"><title> - Jonah&#39;s ML Notes</title><meta name="viewport" content="width=device-width, initial-scale=1">
+	<meta name="description" content="This is the about page." />
+	<meta property="og:image" content=""/>
+	<meta property="og:title" content="" />
+<meta property="og:description" content="This is the about page." />
+<meta property="og:type" content="article" />
+<meta property="og:url" content="https://www.jonahramponi.com/about/" /><meta property="article:section" content="" />
+
+
+<meta name="twitter:card" content="summary"/><meta name="twitter:title" content=""/>
+<meta name="twitter:description" content="This is the about page."/>
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
+	
+        <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
+	
+
+	
+	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
+
+	
+	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
+
+	
+	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
+	
+	
+</head>
+<body>
+        <div class="content"><header>
+	<div class="main">
+		<a href="https://www.jonahramponi.com/">Jonah&#39;s ML Notes</a>
+	</div>
+	<nav>
+		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
+		
+	</nav>
+</header>
+
+<main>
+	<article>
+		<div class="title">
+			<h1 class="title"></h1>
+			<div class="meta">Posted on Jan 1, 0001</div>
+		</div>
+		
+
+		<section class="body">
+			<p>This is the about page.</p>
+
+		</section>
+
+		<div class="post-tags">
+			
+		</div>
+		</article>
+</main>
+<footer>
+  <div style="display:flex"></div>
+  <div class="footer-info">
+    2024  <a
+      href="https://github.com/athul/archie">Archie Theme</a> | Built with <a href="https://gohugo.io">Hugo</a>
+  </div>
+</footer>
+
+
+</div>
+    </body>
+</html>
diff --git a/public/categories/index.html b/public/categories/index.html
index c515807..1a865b1 100644
--- a/public/categories/index.html
+++ b/public/categories/index.html
@@ -11,19 +11,57 @@
 <meta property="og:url" content="https://www.jonahramponi.com/categories/" />
 <meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Categories"/>
 <meta name="twitter:description" content=""/>
-
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
 	
         <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
 	
 
 	
 	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
 
 	
 	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
 
 	
 	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
 	
 	
 </head>
@@ -34,6 +72,12 @@
 	</div>
 	<nav>
 		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
 		
 	</nav>
 </header>
diff --git a/public/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css b/public/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css
new file mode 100644
index 0000000..f93adf1
--- /dev/null
+++ b/public/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css
@@ -0,0 +1,159 @@
+body {
+  color: white;
+  background-color: #202124;
+}
+
+::-moz-selection {
+  background: blue;
+  color: #fff;
+  text-shadow: none;
+}
+
+::selection {
+  background: red;
+  color: #fff;
+  text-shadow: none;
+}
+
+hr {
+  border-top: 3px dotted blue;
+}
+code {
+  background-color: lightblue;
+  color: black;
+  text-decoration: bold;
+  padding: 0.1em 0.2em;
+}
+pre {
+  background-color: #272822;
+  line-height: 1.4;
+  overflow-x: auto;
+  padding: 1em;
+}
+blockquote {
+  border-color: blue;
+}
+
+h1,
+h2,
+h3,
+h4,
+h5,
+h6 {
+  color: #ddd;
+}
+h1::before {
+  color: var(--darkMaincolor);
+  content: "# ";
+}
+h2::before {
+  color: var(--darkMaincolor);
+  content: "## ";
+}
+h3::before {
+  color: var(--darkMaincolor);
+  content: "### ";
+}
+h4::before {
+  color: var(--darkMaincolor);
+  content: "#### ";
+}
+h5::before {
+  color: var(--darkMaincolor);
+  content: "##### ";
+}
+h6::before {
+  color: var(--darkMaincolor);
+  content: "###### ";
+}
+
+a {
+  border-bottom: 3px solid var(--darkMaincolor);
+  color: inherit;
+}
+a:hover {
+  background-color: var(--darkMaincolor);
+  color: black;
+}
+
+.site-description a {
+  color: #ddd;
+}
+.site-description a:hover {
+  color: black;
+}
+
+.tags a {
+  border-bottom: 3px solid var(--darkMaincolor);
+}
+.tags a:hover {
+  background-color: var(--darkMaincolor);
+  color: black;
+}
+
+.site-title a {
+  color: white;
+  text-decoration: none !important;
+}
+
+.header nav,
+.footer {
+  border-color: #333;
+}
+
+.highlight {
+  background-color: #333;
+}
+.soc:hover {
+  color: black;
+}
+.draft-label {
+  color: var(--darkMaincolor);
+  background-color: blue;
+}
+.highlight pre code[class=language-javaScript]::before,
+.highlight pre code[class="language-js"]::before {
+  content: "js";
+  background: #f7df1e;
+  color: black;
+}
+.highlight pre code[class*='language-yml']::before,
+.highlight pre code[class*='language-yaml']::before {
+  content: 'yaml';
+  background: #f71e6a;
+  color: white;
+}
+.highlight pre code[class*='language-shell']::before,
+.highlight pre code[class*='language-bash']::before,
+.highlight pre code[class*='language-sh']::before {
+  content: 'shell';
+  background: green;
+  color:white
+}
+.highlight pre code[class*='language-json']::before{
+  content: 'json';
+  background: dodgerblue;
+   color: #000000 
+}
+.highlight pre code[class*='language-python']::before,
+.highlight pre code[class*='language-py']::before {
+  content: 'py';
+  background: blue;
+  color: yellow ;
+}
+.highlight pre code[class*='language-css']::before{
+  content: 'css';
+  background: cyan;
+  color: black ;
+}
+.highlight pre code[class*='language-go']::before{
+  content: 'Go';
+  background: cyan;
+  color: royalblue ;
+}
+.highlight pre code[class*='language-md']::before,
+.highlight pre code[class*='language-md']::before{
+  content: 'Markdown';
+  background: royalblue;
+  color: whitesmoke ;
+}
\ No newline at end of file
diff --git a/public/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css b/public/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css
index 4b237d8..3a7d1cd 100644
--- a/public/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css
+++ b/public/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css
@@ -1,24 +1,21 @@
 /* Markdown */
-:root {
-  --maincolor: #e24329;
-  --bordercl: rebeccapurple;
-  --callouctcolor: dodgerblue;
-  --hovercolor: navy;
-  --darkMaincolor: #50fa7b;
+:root{
+--maincolor: red;
+--bordercl:rebeccapurple;
+--callouctcolor:dodgerblue;
+--hovercolor:navy;
+--darkMaincolor: #50fa7b;
 }
-
 html {
   color: #232333;
   font-family: 'Roboto Mono', monospace;
   font-size: 15px;
   line-height: 1.6em;
 }
-
-body {
+body{
   display: block;
   margin: 8px;
 }
-
 * {
   -webkit-tap-highlight-color: rgba(0, 0, 0, 0);
 }
@@ -51,22 +48,19 @@ a {
   color: inherit;
   text-decoration: none;
 }
-
 a:hover {
-  background-color: var(--hovercolor);
-  color: #fff;
+    background-color: var(--hovercolor);
+    color: #fff;
 }
 
 ul {
   list-style: none;
   padding-left: 2ch;
 }
-
 ul li {
   text-indent: -2ch;
 }
-
-ul>li::before {
+ul > li::before {
   content: '* ';
   font-weight: bold;
 }
@@ -99,7 +93,6 @@ figure h4 {
   margin: 0;
   margin-bottom: 1em;
 }
-
 figure h4::before {
   content: '↳ ';
 }
@@ -151,46 +144,17 @@ header {
 header .main {
   font-size: 1.5rem;
 }
-
-h1,
-h2,
-h3,
-h4,
-h5,
-h6 {
+h1, h2, h3, h4, h5, h6 {
   font-size: 1.2rem;
   margin-top: 2em;
 }
 
-h1::before {
-  color: var(--maincolor);
-  content: '# ';
-}
-
-h2::before {
-  color: var(--maincolor);
-  content: '## ';
-}
-
-h3::before {
-  color: var(--maincolor);
-  content: '### ';
-}
-
-h4::before {
-  color: var(--maincolor);
-  content: '#### ';
-}
-
-h5::before {
-  color: var(--maincolor);
-  content: '##### ';
-}
-
-h6::before {
-  color: var(--maincolor);
-  content: '###### ';
-}
+h1::before { color: var(--maincolor); content: '# '; }
+h2::before { color: var(--maincolor); content: '## '; }
+h3::before { color: var(--maincolor); content: '### '; }
+h4::before { color: var(--maincolor); content: '#### '; }
+h5::before { color: var(--maincolor); content: '##### '; }
+h6::before { color: var(--maincolor); content: '###### '; }
 
 .meta {
   color: #999;
@@ -205,19 +169,16 @@ footer {
   padding: 2rem 0rem;
   margin-top: 2rem;
 }
-
 .soc {
   display: flex;
   align-items: center;
   border-bottom: none;
 }
-
 .border {
   margin-left: 0.5rem;
   margin-right: 0.5rem;
   border: 1px solid;
 }
-
 .footer-info {
   padding: var(--footer-padding);
 }
@@ -259,49 +220,40 @@ article .title {
 }
 
 .site-description {
-  display: flex;
-  justify-content: space-between;
+display: flex;
+justify-content: space-between;
 }
-
-.tags li::before {
+.tags li::before{
   content: "🏷 ";
 }
-
-.tags a {
-  border-bottom: 3px solid var(--maincolor);
+.tags a{
+  border-bottom: 3px solid var(--maincolor); 
 }
-
-.tags a:hover {
-  color: white;
-  background-color: var(--hovercolor);
+.tags a:hover{
+  color:white;
+  background-color: var(--hovercolor); 
 }
-
-svg {
+svg{
   max-height: 15px;
 }
-
-.soc:hover {
+.soc:hover{
   color: white;
 }
-
-.draft-label {
-  color: var(--bordercl);
-  text-decoration: none;
-  padding: 2px 4px;
-  border-radius: 4px;
-  margin-left: 6px;
-  background-color: #f9f2f4;
+.draft-label{ 
+    color: var(--bordercl);
+    text-decoration: none;
+    padding: 2px 4px;
+    border-radius: 4px;
+    margin-left: 6px;
+    background-color: #f9f2f4;
 }
-
 .highlight {
   position: relative;
   -webkit-overflow-scrolling: touch;
 }
-
 .highlight pre code[class*="language-"] {
   -webkit-overflow-scrolling: touch;
 }
-
 .highlight pre code[class*="language-"]::before {
   background: black;
   border-radius: 0 0 0.25rem 0.25rem;
@@ -318,56 +270,49 @@ svg {
 
 .highlight pre code[class=language-javaScript]::before,
 .highlight pre code[class="language-js"]::before {
-  content: "js";
-  background: #f7df1e;
-  color: black;
+content: "js";
+background: #f7df1e;
+color: black;
 }
-
 .highlight pre code[class*='language-yml']::before,
 .highlight pre code[class*='language-yaml']::before {
-  content: 'yaml';
-  background: #f71e6a;
-  color: white;
+content: 'yaml';
+background: #f71e6a;
+color: white;
 }
-
 .highlight pre code[class*='language-shell']::before,
 .highlight pre code[class*='language-bash']::before,
 .highlight pre code[class*='language-sh']::before {
-  content: 'shell';
-  background: green;
-  color: white
+content: 'shell';
+background: green;
+color:white
 }
-
-.highlight pre code[class*='language-json']::before {
-  content: 'json';
-  background: dodgerblue;
-  color: #000000
+.highlight pre code[class*='language-json']::before{
+content: 'json';
+background: dodgerblue;
+ color: #000000 
 }
-
 .highlight pre code[class*='language-python']::before,
 .highlight pre code[class*='language-py']::before {
-  content: 'py';
-  background: blue;
-  color: yellow;
+content: 'py';
+background: blue;
+color: yellow ;
 }
-
-.highlight pre code[class*='language-css']::before {
-  content: 'css';
-  background: cyan;
-  color: black;
+.highlight pre code[class*='language-css']::before{
+content: 'css';
+background: cyan;
+color: black ;
 }
-
-.highlight pre code[class*='language-go']::before {
-  content: 'Go';
-  background: cyan;
-  color: royalblue;
+.highlight pre code[class*='language-go']::before{
+content: 'Go';
+background: cyan;
+color: royalblue ;
 }
-
 .highlight pre code[class*='language-md']::before,
-.highlight pre code[class*='language-md']::before {
-  content: 'Markdown';
-  background: royalblue;
-  color: whitesmoke;
+.highlight pre code[class*='language-md']::before{
+content: 'Markdown';
+background: royalblue;
+color: whitesmoke ;
 }
 
 /* table */
@@ -376,13 +321,13 @@ table {
   border-collapse: collapse;
 }
 
-table th {
+table th{
   padding: 6px 13px;
   border: 1px solid #dfe2e5;
   font-size: large;
 }
 
-table td {
+table td{
   padding: 6px 13px;
   border: 1px solid #dfe2e5;
-}
\ No newline at end of file
+}
diff --git a/public/img/dilated_sliding_window.png b/public/img/dilated_sliding_window.png
new file mode 100644
index 0000000..b25e679
Binary files /dev/null and b/public/img/dilated_sliding_window.png differ
diff --git a/public/img/first_pred_kv.png b/public/img/first_pred_kv.png
new file mode 100644
index 0000000..f400bcf
Binary files /dev/null and b/public/img/first_pred_kv.png differ
diff --git a/public/img/longformer.png b/public/img/longformer.png
new file mode 100644
index 0000000..798c5d9
Binary files /dev/null and b/public/img/longformer.png differ
diff --git a/public/img/second_pred_kv.png b/public/img/second_pred_kv.png
new file mode 100644
index 0000000..986981d
Binary files /dev/null and b/public/img/second_pred_kv.png differ
diff --git a/public/img/sliding_window.png b/public/img/sliding_window.png
new file mode 100644
index 0000000..61fd48a
Binary files /dev/null and b/public/img/sliding_window.png differ
diff --git a/public/img/sparse_attention.png b/public/img/sparse_attention.png
new file mode 100644
index 0000000..4128ca7
Binary files /dev/null and b/public/img/sparse_attention.png differ
diff --git a/public/index.html b/public/index.html
index 7814d95..14c793d 100644
--- a/public/index.html
+++ b/public/index.html
@@ -12,19 +12,57 @@
 <meta property="og:url" content="https://www.jonahramponi.com/" />
 <meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Jonah&#39;s ML Notes"/>
 <meta name="twitter:description" content=""/>
-
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
 	
         <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
 	
 
 	
 	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
 
 	
 	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
 
 	
 	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
 	
 	
 </head>
@@ -37,6 +75,12 @@
 	</div>
 	<nav>
 		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
 		
 	</nav>
 </header>
@@ -48,28 +92,50 @@
 				
 				
 				<section class="list-item">
-					<h1 class="title"><a href="/posts/test-copy/">Post 2</a></h1>
+					<h1 class="title"><a href="/posts/intro_to_attention/">Intro to Attention</a></h1>
 					<time>Mar 30, 2024</time>
 					<br><div class="description">
 	
-	Here&rsquo;s my second content&hellip;
+	A brief introduction to attention in the transformer architecture.
 	
 </div>
-					<a class="readmore" href="/posts/test-copy/">Read more ⟶</a>
+					<a class="readmore" href="/posts/intro_to_attention/">Read more ⟶</a>
 				</section>
 				
 				<section class="list-item">
-					<h1 class="title"><a href="/posts/test/">Test</a></h1>
-					<time>Mar 30, 2024</time>
+					<h1 class="title"><a href="/posts/flash_attention/">Flash Attention</a></h1>
+					<time>Mar 26, 2024</time>
 					<br><div class="description">
 	
-	Here&rsquo;s my content! What do you think ? $1^2$&hellip;
+	Reduce the memory usage used to compute exact attention.
 	
 </div>
-					<a class="readmore" href="/posts/test/">Read more ⟶</a>
+					<a class="readmore" href="/posts/flash_attention/">Read more ⟶</a>
 				</section>
 				
+				<section class="list-item">
+					<h1 class="title"><a href="/posts/mqa_gqa/">Multi &amp; Grouped Query Attention</a></h1>
+					<time>Mar 22, 2024</time>
+					<br><div class="description">
+	
+	Use less K and V matrices to use less memory.
+	
+</div>
+					<a class="readmore" href="/posts/mqa_gqa/">Read more ⟶</a>
+				</section>
 				
+				
+
+<ul class="pagination">
+	<span class="page-item page-prev">
+	
+	</span>
+	<span class="page-item page-next">
+	
+    <a href="/page/2/" class="page-link" aria-label="Next"><span aria-hidden="true">Next →</span></a>
+	
+	</span>
+</ul>
 
 
 			</main>
diff --git a/public/index.xml b/public/index.xml
index 2a9b304..9e8f3f5 100644
--- a/public/index.xml
+++ b/public/index.xml
@@ -6,21 +6,63 @@
     <description>Recent content on Jonah&#39;s ML Notes</description>
     <generator>Hugo -- gohugo.io</generator>
     <language>en-us</language>
-    <lastBuildDate>Sat, 30 Mar 2024 11:49:13 +0000</lastBuildDate>
+    <lastBuildDate>Sat, 30 Mar 2024 00:00:00 +0000</lastBuildDate>
     <atom:link href="https://www.jonahramponi.com/index.xml" rel="self" type="application/rss+xml" />
     <item>
-      <title>Post 2</title>
-      <link>https://www.jonahramponi.com/posts/test-copy/</link>
-      <pubDate>Sat, 30 Mar 2024 11:49:13 +0000</pubDate>
-      <guid>https://www.jonahramponi.com/posts/test-copy/</guid>
-      <description>Here&amp;rsquo;s my second content</description>
+      <title>Intro to Attention</title>
+      <link>https://www.jonahramponi.com/posts/intro_to_attention/</link>
+      <pubDate>Sat, 30 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/intro_to_attention/</guid>
+      <description>Suppose you give an LLM the input&#xA;What is the capital of France?&#xA;The first thing the LLM will do is split this input into tokens. A token is just some combinations of characters. You can see an example of the tokenization outputs for the question below.&#xA;$\colorbox{red}{What}\colorbox{magenta}{ is}\colorbox{green}{ the}\colorbox{orange}{ capital}\colorbox{purple}{ of}\colorbox{brown}{ France}\colorbox{cyan}?$&#xA;(This tokenization was produced using cl100k_base, the tokenizer used in GPT-3.5-turbo and GPT-4.)&#xA;In this example we have $(n = 7)$ tokens.</description>
     </item>
     <item>
-      <title>Test</title>
-      <link>https://www.jonahramponi.com/posts/test/</link>
-      <pubDate>Sat, 30 Mar 2024 11:49:13 +0000</pubDate>
-      <guid>https://www.jonahramponi.com/posts/test/</guid>
-      <description>Here&amp;rsquo;s my content! What do you think ? $1^2$</description>
+      <title>Flash Attention</title>
+      <link>https://www.jonahramponi.com/posts/flash_attention/</link>
+      <pubDate>Tue, 26 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/flash_attention/</guid>
+      <description>The goal of Flash Attention is to compute the attention value with fewer high bandwidth memory read / writes. The approach has since been refined in Flash Attention 2.&#xA;We will split the attention inputs $Q,K,V$ into blocks. Each block will be handled separately, and attention will therefore be computed with respect to each block. With the correct scaling, adding the outputs from each block we will give us the same attention value as we would get by computing everything all together.</description>
+    </item>
+    <item>
+      <title>Multi &amp; Grouped Query Attention</title>
+      <link>https://www.jonahramponi.com/posts/mqa_gqa/</link>
+      <pubDate>Fri, 22 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/mqa_gqa/</guid>
+      <description>Multi Query Attention Multi Query Attention (MQA) using the same $K$ and $V$ matrices for each head in our multi head self attention mechanism. For a given head, $h$, $1 \leq h \leq H$, the attention mechanism is calculated as&#xA;\begin{equation} h_i = \text{attention}(M\cdot W_h^Q, M \cdot W^K,M \cdot W^V). \end{equation}&#xA;For each of our $H$ heads, the only difference in the weight matrices is in $W_h^Q$. Each of these $W_h$ has dimension $(n \times d_q)$.</description>
+    </item>
+    <item>
+      <title>Sliding Window Attention</title>
+      <link>https://www.jonahramponi.com/posts/sliding_window_attention/</link>
+      <pubDate>Fri, 22 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/sliding_window_attention/</guid>
+      <description>Sliding Window Attention reduces the number of calculations we are doing when computing self attention. Previously, to compute attention we took our input matrix of positional encodings $M$, and made copies named $Q, K$ and $V$. We used these copies to compute&#xA;\begin{equation} \text{attention}(Q,K,V) = \text{softmax}\Big(\frac{Q K^T}{\sqrt{d_k}}\Big) V. \end{equation}&#xA;For now, let&amp;rsquo;s ignore the re-scaling by $\sqrt{d_k}$ and just look at the computation of $QK^T$. This computation looks like \begin{equation} Q \times K^T = \begin{pmatrix} Q_{11} &amp;amp; Q_{12} &amp;amp; \cdots &amp;amp; Q_{1d} \\ \vdots &amp;amp; \ddots &amp;amp; \cdots &amp;amp; \vdots \\ Q_{n1} &amp;amp; Q_{n2} &amp;amp; \cdots &amp;amp; Q_{nd} \end{pmatrix} \times \begin{pmatrix} K_{11} &amp;amp; K_{21} &amp;amp; \cdots &amp;amp; K_{n1} \\ \vdots &amp;amp; \ddots &amp;amp; \cdots &amp;amp; \vdots \\ K_{1d} &amp;amp; K_{2d} &amp;amp; \cdots &amp;amp; K_{nd} \end{pmatrix} \end{equation}</description>
+    </item>
+    <item>
+      <title>Sparse Attention</title>
+      <link>https://www.jonahramponi.com/posts/sparse_attention/</link>
+      <pubDate>Fri, 22 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/sparse_attention/</guid>
+      <description>Sparse Attention introduces sparse factorizations on the attention matrix. To implement this we introduce a connectivity pattern $S = {S_1,\dots,S_n}$. Here, $S_i$ denotes the set of indices of the input vectors to which the $i$th output vector attends. For instance, in regular $n^2$ attention every input vector attends to every output vector before it in the sequence. Remember that $d_k$ is the inner dimension of our queries and keys. Sparse Attention is given as follows</description>
+    </item>
+    <item>
+      <title>The KV Cache</title>
+      <link>https://www.jonahramponi.com/posts/kv_cache/</link>
+      <pubDate>Fri, 22 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/kv_cache/</guid>
+      <description>The computation of attention is costly. Remember that our decoder works in an auto-regressive fashion. For our given input $$\colorbox{red}{What}\colorbox{magenta}{ is}\colorbox{green}{ the}\colorbox{orange}{ capital}\colorbox{purple}{ of}\colorbox{brown}{ France}\colorbox{cyan}{?}&amp;quot;$$&#xA;\begin{align} \text{Prediction 1} &amp;amp;= \colorbox{orange}{The} \\ \text{Prediction 2} &amp;amp;= \colorbox{orange}{The}\colorbox{pink}{ capital} \\ &amp;amp;\vdots \\ \text{Prediction $p$} &amp;amp;= \colorbox{orange}{The}\colorbox{pink}{ capital} (\dots) \colorbox{red}{ Paris.} \end{align}&#xA;To produce prediction $2$, we will take the output from prediction $1$. At each step, the model will also see our input sequence.</description>
+    </item>
+    <item>
+      <title>PDFs and Resources</title>
+      <link>https://www.jonahramponi.com/posts/resources/</link>
+      <pubDate>Wed, 28 Feb 2024 11:49:13 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/resources/</guid>
+      <description>The contents of this website can be found as a pdf here.</description>
+    </item>
+    <item>
+      <title></title>
+      <link>https://www.jonahramponi.com/about/</link>
+      <pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/about/</guid>
+      <description>This is the about page.</description>
     </item>
   </channel>
 </rss>
diff --git a/public/page/2/index.html b/public/page/2/index.html
new file mode 100644
index 0000000..60ffe39
--- /dev/null
+++ b/public/page/2/index.html
@@ -0,0 +1,157 @@
+<!DOCTYPE html>
+<html>
+	<head lang="en">
+	<meta charset="utf-8" />
+	<meta http-equiv="X-UA-Compatible" content="IE=edge"><title>Jonah&#39;s ML Notes | Home </title><meta name="viewport" content="width=device-width, initial-scale=1">
+	<meta name="description" content="" />
+	<meta property="og:image" content=""/>
+	<link rel="alternate" type="application/rss+xml" href="https://www.jonahramponi.com/index.xml" title="Jonah's ML Notes" />
+	<meta property="og:title" content="Jonah&#39;s ML Notes" />
+<meta property="og:description" content="" />
+<meta property="og:type" content="website" />
+<meta property="og:url" content="https://www.jonahramponi.com/" />
+<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Jonah&#39;s ML Notes"/>
+<meta name="twitter:description" content=""/>
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
+	
+        <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
+	
+
+	
+	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
+
+	
+	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
+
+	
+	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
+	
+	
+</head>
+
+	<body>
+		<div class="content">
+			<header>
+	<div class="main">
+		<a href="https://www.jonahramponi.com/">Jonah&#39;s ML Notes</a>
+	</div>
+	<nav>
+		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
+		
+	</nav>
+</header>
+
+			
+			<main class="list">
+				<div class="site-description"></div>
+				
+				
+				
+				<section class="list-item">
+					<h1 class="title"><a href="/posts/sliding_window_attention/">Sliding Window Attention</a></h1>
+					<time>Mar 22, 2024</time>
+					<br><div class="description">
+	
+	Altering the tokens to which a token in the input sequence attends.
+	
+</div>
+					<a class="readmore" href="/posts/sliding_window_attention/">Read more ⟶</a>
+				</section>
+				
+				<section class="list-item">
+					<h1 class="title"><a href="/posts/sparse_attention/">Sparse Attention</a></h1>
+					<time>Mar 22, 2024</time>
+					<br><div class="description">
+	
+	Reducing the number of calculations to compute attention.
+	
+</div>
+					<a class="readmore" href="/posts/sparse_attention/">Read more ⟶</a>
+				</section>
+				
+				<section class="list-item">
+					<h1 class="title"><a href="/posts/kv_cache/">The KV Cache</a></h1>
+					<time>Mar 22, 2024</time>
+					<br><div class="description">
+	
+	Computing the attention more efficiently at inference.
+	
+</div>
+					<a class="readmore" href="/posts/kv_cache/">Read more ⟶</a>
+				</section>
+				
+				
+
+<ul class="pagination">
+	<span class="page-item page-prev">
+	
+    <a href="/" class="page-link" aria-label="Previous"><span aria-hidden="true">← Prev</span></a>
+	
+	</span>
+	<span class="page-item page-next">
+	
+    <a href="/page/3/" class="page-link" aria-label="Next"><span aria-hidden="true">Next →</span></a>
+	
+	</span>
+</ul>
+
+
+			</main>
+			<footer>
+  <div style="display:flex"></div>
+  <div class="footer-info">
+    2024  <a
+      href="https://github.com/athul/archie">Archie Theme</a> | Built with <a href="https://gohugo.io">Hugo</a>
+  </div>
+</footer>
+
+
+
+		</div>
+		
+	</body>
+</html>
diff --git a/public/page/3/index.html b/public/page/3/index.html
new file mode 100644
index 0000000..1473a1b
--- /dev/null
+++ b/public/page/3/index.html
@@ -0,0 +1,133 @@
+<!DOCTYPE html>
+<html>
+	<head lang="en">
+	<meta charset="utf-8" />
+	<meta http-equiv="X-UA-Compatible" content="IE=edge"><title>Jonah&#39;s ML Notes | Home </title><meta name="viewport" content="width=device-width, initial-scale=1">
+	<meta name="description" content="" />
+	<meta property="og:image" content=""/>
+	<link rel="alternate" type="application/rss+xml" href="https://www.jonahramponi.com/index.xml" title="Jonah's ML Notes" />
+	<meta property="og:title" content="Jonah&#39;s ML Notes" />
+<meta property="og:description" content="" />
+<meta property="og:type" content="website" />
+<meta property="og:url" content="https://www.jonahramponi.com/" />
+<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Jonah&#39;s ML Notes"/>
+<meta name="twitter:description" content=""/>
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
+	
+        <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
+	
+
+	
+	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
+
+	
+	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
+
+	
+	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
+	
+	
+</head>
+
+	<body>
+		<div class="content">
+			<header>
+	<div class="main">
+		<a href="https://www.jonahramponi.com/">Jonah&#39;s ML Notes</a>
+	</div>
+	<nav>
+		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
+		
+	</nav>
+</header>
+
+			
+			<main class="list">
+				<div class="site-description"></div>
+				
+				
+				
+				<section class="list-item">
+					<h1 class="title"><a href="/posts/resources/">PDFs and Resources</a></h1>
+					<time>Feb 28, 2024</time>
+					<br><div class="description">
+	
+	The contents of this website can be found as a pdf here.&hellip;
+	
+</div>
+					<a class="readmore" href="/posts/resources/">Read more ⟶</a>
+				</section>
+				
+				
+
+<ul class="pagination">
+	<span class="page-item page-prev">
+	
+    <a href="/page/2/" class="page-link" aria-label="Previous"><span aria-hidden="true">← Prev</span></a>
+	
+	</span>
+	<span class="page-item page-next">
+	
+	</span>
+</ul>
+
+
+			</main>
+			<footer>
+  <div style="display:flex"></div>
+  <div class="footer-info">
+    2024  <a
+      href="https://github.com/athul/archie">Archie Theme</a> | Built with <a href="https://gohugo.io">Hugo</a>
+  </div>
+</footer>
+
+
+
+		</div>
+		
+	</body>
+</html>
diff --git a/public/posts/file/Attention_Mechanisms.pdf b/public/posts/file/Attention_Mechanisms.pdf
new file mode 100644
index 0000000..a40bc0a
Binary files /dev/null and b/public/posts/file/Attention_Mechanisms.pdf differ
diff --git a/public/posts/flash_attention/index.html b/public/posts/flash_attention/index.html
new file mode 100644
index 0000000..7c58f74
--- /dev/null
+++ b/public/posts/flash_attention/index.html
@@ -0,0 +1,164 @@
+<!DOCTYPE html>
+<html><head lang="en">
+	<meta charset="utf-8" />
+	<meta http-equiv="X-UA-Compatible" content="IE=edge"><title>Flash Attention - Jonah&#39;s ML Notes</title><meta name="viewport" content="width=device-width, initial-scale=1">
+	<meta name="description" content="Reduce the memory usage used to compute exact attention." />
+	<meta property="og:image" content=""/>
+	<meta property="og:title" content="Flash Attention" />
+<meta property="og:description" content="Reduce the memory usage used to compute exact attention." />
+<meta property="og:type" content="article" />
+<meta property="og:url" content="https://www.jonahramponi.com/posts/flash_attention/" /><meta property="article:section" content="posts" />
+<meta property="article:published_time" content="2024-03-26T00:00:00+00:00" />
+<meta property="article:modified_time" content="2024-03-26T00:00:00+00:00" />
+<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Flash Attention"/>
+<meta name="twitter:description" content="Reduce the memory usage used to compute exact attention."/>
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
+	
+        <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
+	
+
+	
+	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
+
+	
+	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
+
+	
+	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
+	
+	
+</head>
+<body>
+        <div class="content"><header>
+	<div class="main">
+		<a href="https://www.jonahramponi.com/">Jonah&#39;s ML Notes</a>
+	</div>
+	<nav>
+		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
+		
+	</nav>
+</header>
+
+<main>
+	<article>
+		<div class="title">
+			<h1 class="title">Flash Attention</h1>
+			<div class="meta">Posted on Mar 26, 2024</div>
+		</div>
+		
+		<div class="tldr">
+			<strong>tl;dr:</strong>
+			Reduce the memory usage used to compute exact attention.
+		</div>
+
+		<section class="body">
+			<p>The goal of <a href="https://arxiv.org/pdf/2205.14135.pdf"><em>Flash Attention</em></a> is to compute the attention value with fewer high bandwidth memory read / writes. The approach has since been refined in  <a href="https://arxiv.org/pdf/2307.08691.pdf"><em>Flash Attention 2</em></a>.</p>
+<p>We will split the attention inputs $Q,K,V$ into blocks. Each block will be handled separately, and attention will therefore be computed with respect to each block. With the correct scaling, adding the outputs from each block we will give us the same attention value as we would get by computing everything all together.</p>
+<p><strong>Tilling.</strong> To compute attention, we multiply $Q \times K^T$, divide by $\sqrt{d_k}$ and then take the softmax. Keeping track of the scaling values in softmax is the key to making this technique work. The softmax for a vector $\vec{x} \in \mathbb{R}^{2n}$ is given by</p>
+<p>$$
+m(x):= \max_i x_i, \hspace{3mm} f(x):= [e^{x_1-m(x)}, \dots, e^{x_b -m(x)}], \hspace{3mm} \ell(x) := \sum_i f(x)_i, \hspace{3mm} \text{softmax}(x) := \frac{f(x)}{\ell(x)}.
+$$</p>
+<p>This looks unfriendly, but is really just the notation for a more numerically stable softmax. What does that mean? Well, notice we are just applying regular softmax but with some shifting of each element of vector $\vec{x}$ by $\max(x)$ units. We can do this because softmax$(\vec{x}) = \text{softmax}(\vec{x}-c)$ for any scalar $c$.</p>
+<p><em>Proof</em>
+\begin{align*}
+\text{softmax}(\vec{x} - c) &amp;= \frac{e^{\vec{x} - c}}{\sum_{j} e^{x_j - c}} \\
+&amp;= \frac{e^{\vec{x}} \cdot e^{-c}}{\sum_{j} e^{x_j} \cdot e^{-c}} \\
+&amp;= \frac{e^{\vec{x}}}{\sum_{j} e^{x_j}} \\
+&amp;= \text{softmax}(\vec{x})
+\end{align*}</p>
+<p>In this case, we improve numerical stability by ensuring we do not take the exponential of very large numbers. This can lead to overflow issues. This simply means our number gets too big to store in the given datatype. By subtracting the largest element, we ensure the vector $\vec{x}$ only has non-positive entries. For example, in floating point 64, the maximum value we can represent is very large $(10^{308})$. However</p>
+<p>$$
+e^x &gt; 10^{308} \implies x &gt; \ln(10^{308}) \implies x &gt; 308 \times \ln(10) \implies x &gt; 709.
+$$</p>
+<p>Therefore, approximately any $x$ larger than $709$ will result in overflow issues. For instance, computing $\exp(709) = 8.22e+307$ but $\exp(710) = inf$ in <em>numpy</em>.</p>
+<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>np<span style="color:#f92672">.</span>exp(<span style="color:#ae81ff">709</span>)
+</span></span><span style="display:flex;"><span><span style="color:#75715e"># 8.218407461554972e+307</span>
+</span></span></code></pre></div><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>np<span style="color:#f92672">.</span>exp(<span style="color:#ae81ff">710</span>)
+</span></span><span style="display:flex;"><span><span style="color:#75715e"># &lt;stdin&gt;:1: RuntimeWarning: overflow encountered in exp</span>
+</span></span><span style="display:flex;"><span><span style="color:#75715e"># inf</span>
+</span></span></code></pre></div><p>We certainly do not want our model to hit any overflow errors. It is therefore preferable to use this numerically stable version of softmax.</p>
+<p>To compute softmax in blocks, we decompose our vector $\vec{x} \in \mathbb{R}^{2n}$ into two smaller vectors in $\mathbb{R}^n$.Let&rsquo;s look at the simple case of decomposing into two vectors. Denote these vectors $\vec{x}_1,\vec{x}_2$ each in $\mathbb{R}^n$. Our softmax calculation becomes</p>
+<p>\begin{aligned}
+m(x) &amp;= m([x_1\hspace{3mm}  x_2]) = \max (m(x_1),m(x_2)), \\
+f(x) &amp;= [e^{m(x_1) - m(x)}f(x_1) \hspace{3mm} e^{m(x_2) - m(x)}f(x_2)], \\
+\ell(x) &amp;= \ell([x_1\hspace{3mm}  x_2]) = [e^{m(x_1) - m(x)}\ell(x_1) \hspace{3mm} e^{m(x_2) - m(x)}\ell(x_2)], \\
+\text{softmax}(x) &amp;= \frac{f(x)}{\ell(x)}.
+\end{aligned}</p>
+<p>Notice that we use $m(x_i) - m(x)$ as the normalization factor, as we do not know which group will contain the maximum value of $\vec{x}$. By keeping track of both $m(x)$ and $\ell(x)$ we will be able to accurately recombine the softmax outputs for each block, as will know how to rescale the softmax outputs.</p>
+<p><strong>Recomputation.</strong> We also do not wish to store all the intermediate values we calculate for every backward pass. Typically we require the attention matrix, $QK^T$, and the output after softmax, simply softmax($QK^T$) in each backward pass. However, by using our blocks of $Q,K,V$ the whole attention matrix is not required to be loaded in during every backward pass.</p>
+
+		</section>
+
+		<div class="post-tags">
+			
+			
+			<nav class="nav tags">
+				<ul class="tags">
+					
+					<li><a href="/tags/attention">attention</a></li>
+					
+					<li><a href="/tags/inference">inference</a></li>
+					
+				</ul>
+			</nav>
+			
+			
+		</div>
+		</article>
+</main>
+<footer>
+  <div style="display:flex"></div>
+  <div class="footer-info">
+    2024  <a
+      href="https://github.com/athul/archie">Archie Theme</a> | Built with <a href="https://gohugo.io">Hugo</a>
+  </div>
+</footer>
+
+
+</div>
+    </body>
+</html>
diff --git a/public/posts/index.html b/public/posts/index.html
index 9195a91..5ca1d29 100644
--- a/public/posts/index.html
+++ b/public/posts/index.html
@@ -11,19 +11,57 @@
 <meta property="og:url" content="https://www.jonahramponi.com/posts/" />
 <meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Posts"/>
 <meta name="twitter:description" content=""/>
-
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
 	
         <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
 	
 
 	
 	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
 
 	
 	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
 
 	
 	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
 	
 	
 </head>
@@ -34,6 +72,12 @@
 	</div>
 	<nav>
 		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
 		
 	</nav>
 </header>
@@ -43,9 +87,19 @@ <h1 class="page-title">All articles</h1>
 
 
 <ul class="posts"><li class="post">
-			<a href="/posts/test-copy/">Post 2</a> <span class="meta">Mar 30, 2024</span>
+			<a href="/posts/intro_to_attention/">Intro to Attention</a> <span class="meta">Mar 30, 2024</span>
+		</li><li class="post">
+			<a href="/posts/flash_attention/">Flash Attention</a> <span class="meta">Mar 26, 2024</span>
+		</li><li class="post">
+			<a href="/posts/mqa_gqa/">Multi &amp; Grouped Query Attention</a> <span class="meta">Mar 22, 2024</span>
+		</li><li class="post">
+			<a href="/posts/sliding_window_attention/">Sliding Window Attention</a> <span class="meta">Mar 22, 2024</span>
+		</li><li class="post">
+			<a href="/posts/sparse_attention/">Sparse Attention</a> <span class="meta">Mar 22, 2024</span>
+		</li><li class="post">
+			<a href="/posts/kv_cache/">The KV Cache</a> <span class="meta">Mar 22, 2024</span>
 		</li><li class="post">
-			<a href="/posts/test/">Test</a> <span class="meta">Mar 30, 2024</span>
+			<a href="/posts/resources/">PDFs and Resources</a> <span class="meta">Feb 28, 2024</span>
 		</li></ul>
 <footer>
   <div style="display:flex"></div>
diff --git a/public/posts/index.xml b/public/posts/index.xml
index 7cd34b3..22735a3 100644
--- a/public/posts/index.xml
+++ b/public/posts/index.xml
@@ -6,21 +6,56 @@
     <description>Recent content in Posts on Jonah&#39;s ML Notes</description>
     <generator>Hugo -- gohugo.io</generator>
     <language>en-us</language>
-    <lastBuildDate>Sat, 30 Mar 2024 11:49:13 +0000</lastBuildDate>
+    <lastBuildDate>Sat, 30 Mar 2024 00:00:00 +0000</lastBuildDate>
     <atom:link href="https://www.jonahramponi.com/posts/index.xml" rel="self" type="application/rss+xml" />
     <item>
-      <title>Post 2</title>
-      <link>https://www.jonahramponi.com/posts/test-copy/</link>
-      <pubDate>Sat, 30 Mar 2024 11:49:13 +0000</pubDate>
-      <guid>https://www.jonahramponi.com/posts/test-copy/</guid>
-      <description>Here&amp;rsquo;s my second content</description>
+      <title>Intro to Attention</title>
+      <link>https://www.jonahramponi.com/posts/intro_to_attention/</link>
+      <pubDate>Sat, 30 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/intro_to_attention/</guid>
+      <description>Suppose you give an LLM the input&#xA;What is the capital of France?&#xA;The first thing the LLM will do is split this input into tokens. A token is just some combinations of characters. You can see an example of the tokenization outputs for the question below.&#xA;$\colorbox{red}{What}\colorbox{magenta}{ is}\colorbox{green}{ the}\colorbox{orange}{ capital}\colorbox{purple}{ of}\colorbox{brown}{ France}\colorbox{cyan}?$&#xA;(This tokenization was produced using cl100k_base, the tokenizer used in GPT-3.5-turbo and GPT-4.)&#xA;In this example we have $(n = 7)$ tokens.</description>
     </item>
     <item>
-      <title>Test</title>
-      <link>https://www.jonahramponi.com/posts/test/</link>
-      <pubDate>Sat, 30 Mar 2024 11:49:13 +0000</pubDate>
-      <guid>https://www.jonahramponi.com/posts/test/</guid>
-      <description>Here&amp;rsquo;s my content! What do you think ? $1^2$</description>
+      <title>Flash Attention</title>
+      <link>https://www.jonahramponi.com/posts/flash_attention/</link>
+      <pubDate>Tue, 26 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/flash_attention/</guid>
+      <description>The goal of Flash Attention is to compute the attention value with fewer high bandwidth memory read / writes. The approach has since been refined in Flash Attention 2.&#xA;We will split the attention inputs $Q,K,V$ into blocks. Each block will be handled separately, and attention will therefore be computed with respect to each block. With the correct scaling, adding the outputs from each block we will give us the same attention value as we would get by computing everything all together.</description>
+    </item>
+    <item>
+      <title>Multi &amp; Grouped Query Attention</title>
+      <link>https://www.jonahramponi.com/posts/mqa_gqa/</link>
+      <pubDate>Fri, 22 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/mqa_gqa/</guid>
+      <description>Multi Query Attention Multi Query Attention (MQA) using the same $K$ and $V$ matrices for each head in our multi head self attention mechanism. For a given head, $h$, $1 \leq h \leq H$, the attention mechanism is calculated as&#xA;\begin{equation} h_i = \text{attention}(M\cdot W_h^Q, M \cdot W^K,M \cdot W^V). \end{equation}&#xA;For each of our $H$ heads, the only difference in the weight matrices is in $W_h^Q$. Each of these $W_h$ has dimension $(n \times d_q)$.</description>
+    </item>
+    <item>
+      <title>Sliding Window Attention</title>
+      <link>https://www.jonahramponi.com/posts/sliding_window_attention/</link>
+      <pubDate>Fri, 22 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/sliding_window_attention/</guid>
+      <description>Sliding Window Attention reduces the number of calculations we are doing when computing self attention. Previously, to compute attention we took our input matrix of positional encodings $M$, and made copies named $Q, K$ and $V$. We used these copies to compute&#xA;\begin{equation} \text{attention}(Q,K,V) = \text{softmax}\Big(\frac{Q K^T}{\sqrt{d_k}}\Big) V. \end{equation}&#xA;For now, let&amp;rsquo;s ignore the re-scaling by $\sqrt{d_k}$ and just look at the computation of $QK^T$. This computation looks like \begin{equation} Q \times K^T = \begin{pmatrix} Q_{11} &amp;amp; Q_{12} &amp;amp; \cdots &amp;amp; Q_{1d} \\ \vdots &amp;amp; \ddots &amp;amp; \cdots &amp;amp; \vdots \\ Q_{n1} &amp;amp; Q_{n2} &amp;amp; \cdots &amp;amp; Q_{nd} \end{pmatrix} \times \begin{pmatrix} K_{11} &amp;amp; K_{21} &amp;amp; \cdots &amp;amp; K_{n1} \\ \vdots &amp;amp; \ddots &amp;amp; \cdots &amp;amp; \vdots \\ K_{1d} &amp;amp; K_{2d} &amp;amp; \cdots &amp;amp; K_{nd} \end{pmatrix} \end{equation}</description>
+    </item>
+    <item>
+      <title>Sparse Attention</title>
+      <link>https://www.jonahramponi.com/posts/sparse_attention/</link>
+      <pubDate>Fri, 22 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/sparse_attention/</guid>
+      <description>Sparse Attention introduces sparse factorizations on the attention matrix. To implement this we introduce a connectivity pattern $S = {S_1,\dots,S_n}$. Here, $S_i$ denotes the set of indices of the input vectors to which the $i$th output vector attends. For instance, in regular $n^2$ attention every input vector attends to every output vector before it in the sequence. Remember that $d_k$ is the inner dimension of our queries and keys. Sparse Attention is given as follows</description>
+    </item>
+    <item>
+      <title>The KV Cache</title>
+      <link>https://www.jonahramponi.com/posts/kv_cache/</link>
+      <pubDate>Fri, 22 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/kv_cache/</guid>
+      <description>The computation of attention is costly. Remember that our decoder works in an auto-regressive fashion. For our given input $$\colorbox{red}{What}\colorbox{magenta}{ is}\colorbox{green}{ the}\colorbox{orange}{ capital}\colorbox{purple}{ of}\colorbox{brown}{ France}\colorbox{cyan}{?}&amp;quot;$$&#xA;\begin{align} \text{Prediction 1} &amp;amp;= \colorbox{orange}{The} \\ \text{Prediction 2} &amp;amp;= \colorbox{orange}{The}\colorbox{pink}{ capital} \\ &amp;amp;\vdots \\ \text{Prediction $p$} &amp;amp;= \colorbox{orange}{The}\colorbox{pink}{ capital} (\dots) \colorbox{red}{ Paris.} \end{align}&#xA;To produce prediction $2$, we will take the output from prediction $1$. At each step, the model will also see our input sequence.</description>
+    </item>
+    <item>
+      <title>PDFs and Resources</title>
+      <link>https://www.jonahramponi.com/posts/resources/</link>
+      <pubDate>Wed, 28 Feb 2024 11:49:13 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/resources/</guid>
+      <description>The contents of this website can be found as a pdf here.</description>
     </item>
   </channel>
 </rss>
diff --git a/public/posts/intro_to_attention/index.html b/public/posts/intro_to_attention/index.html
new file mode 100644
index 0000000..82338b4
--- /dev/null
+++ b/public/posts/intro_to_attention/index.html
@@ -0,0 +1,270 @@
+<!DOCTYPE html>
+<html><head lang="en">
+	<meta charset="utf-8" />
+	<meta http-equiv="X-UA-Compatible" content="IE=edge"><title>Intro to Attention - Jonah&#39;s ML Notes</title><meta name="viewport" content="width=device-width, initial-scale=1">
+	<meta name="description" content="A brief introduction to attention in the transformer architecture." />
+	<meta property="og:image" content=""/>
+	<meta property="og:title" content="Intro to Attention" />
+<meta property="og:description" content="A brief introduction to attention in the transformer architecture." />
+<meta property="og:type" content="article" />
+<meta property="og:url" content="https://www.jonahramponi.com/posts/intro_to_attention/" /><meta property="article:section" content="posts" />
+<meta property="article:published_time" content="2024-03-30T00:00:00+00:00" />
+<meta property="article:modified_time" content="2024-03-30T00:00:00+00:00" />
+<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Intro to Attention"/>
+<meta name="twitter:description" content="A brief introduction to attention in the transformer architecture."/>
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
+	
+        <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
+	
+
+	
+	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
+
+	
+	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
+
+	
+	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
+	
+	
+</head>
+<body>
+        <div class="content"><header>
+	<div class="main">
+		<a href="https://www.jonahramponi.com/">Jonah&#39;s ML Notes</a>
+	</div>
+	<nav>
+		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
+		
+	</nav>
+</header>
+
+<main>
+	<article>
+		<div class="title">
+			<h1 class="title">Intro to Attention</h1>
+			<div class="meta">Posted on Mar 30, 2024</div>
+		</div>
+		
+		<div class="tldr">
+			<strong>tl;dr:</strong>
+			A brief introduction to attention in the transformer architecture.
+		</div>
+
+		<section class="body">
+			<p>Suppose you give an LLM the input</p>
+<p><em>What is the capital of France?</em></p>
+<p>The first thing the LLM will do is split this input into tokens. A token is just some combinations of characters. You can see an example of the tokenization outputs for the question below.</p>
+<p><em>$\colorbox{red}{What}\colorbox{magenta}{ is}\colorbox{green}{ the}\colorbox{orange}{ capital}\colorbox{purple}{ of}\colorbox{brown}{ France}\colorbox{cyan}?$</em></p>
+<p>(This tokenization was produced using cl100k_base, the tokenizer used in GPT-3.5-turbo and GPT-4.)</p>
+<p>In this example we have $(n = 7)$ tokens. Importantly, from our model&rsquo;s point of view, our input size is defined by the number of tokens instead of words. A numerical representation (vector representation) of each token is now found. Finding this vector representation is called producing an embedding of the token. The token <em>$\colorbox{red}{ What}$</em> might get tokenized as follows</p>
+<p>\begin{equation}
+\text{tokenizer}(\textit{\colorbox{red}{What}}) \rightarrow \begin{pmatrix} -0.4159 \\  \vdots \\   0.5710 \\   \end{pmatrix}
+\end{equation}</p>
+<p>The length of each of our embeddings, these vector outputs of our tokenizer, are the same regardless of the number of characters in our token. Let us denote this length $d_{\text{model}}$. So after we embed each token in our input sequence with our tokenizer we are left with</p>
+<p>$$
+\begin{pmatrix} -0.415 \\  \vdots \\   0.571 \\   \end{pmatrix}
+\begin{pmatrix} -0.130  \\ \vdots  \\ 0.192 \\ \end{pmatrix}
+, \dots ,
+\begin{pmatrix} 0.127  \\ \vdots \\ 0.484 \\ \end{pmatrix}
+$$</p>
+<p>This output is now passed through a <em>positional encoder</em>. Broadly, this is useful to provide the model with information about the position of words or tokens within a sequence. You might wonder why we need to positionally encode each token. What does it even mean to positionally encode something? Why can&rsquo;t we just use the index of the item? These questions are for another post.</p>
+<p>The only thing that matters for now, is that each of our numerical representations (vectors) are slightly altered. For the numerical representation of the token $\colorbox{red}{ What}$ that we get from our embedding model, it might look something like:</p>
+<p>\begin{equation}
+\text{positional encoder}\Bigg(\begin{pmatrix} -0.415 \\    \vdots \\    0.571 \\   \end{pmatrix}\Bigg) =
+\begin{pmatrix} -0.424 \\  \vdots \\   0.534 \\   \end{pmatrix}
+\end{equation}</p>
+<p>Importantly, the positional encoder does not alter the length of our vector, $d_{\text{model}}$. It simply tweaks the values slightly. So far, we entered our prompt:</p>
+<p><em>What is the capital of France?</em></p>
+<p>This was tokenized</p>
+<p><em>$\colorbox{red}{What}\colorbox{magenta}{ is}\colorbox{green}{ the}\colorbox{orange}{ capital}\colorbox{purple}{ of}\colorbox{brown}{ France}\colorbox{cyan}?$</em></p>
+<p>Then embedded</p>
+<p>$$
+\begin{pmatrix} -0.415 \\  \vdots \\   0.571 \\   \end{pmatrix}
+\begin{pmatrix} -0.130  \\ \vdots  \\ 0.192 \\ \end{pmatrix}
+, \dots ,
+\begin{pmatrix} 0.127  \\ \vdots \\ 0.484 \\ \end{pmatrix}
+$$</p>
+<p>and finally positionally encoded</p>
+<p>$$
+\begin{pmatrix} -0.424 \\  \vdots \\   0.534 \\   \end{pmatrix}
+\begin{pmatrix} 0.110  \\ \vdots  \\ 0.212 \\ \end{pmatrix}
+, \dots ,
+\begin{pmatrix} 0.070  \\ \vdots \\ 0.324 \\ \end{pmatrix}
+$$</p>
+<p>We&rsquo;re now very close to being able to introduce attention. One last thing remains, at this point we will transform the output of our positional encoding to a matrix $M$ as follows</p>
+<p>\begin{equation}
+M = \begin{pmatrix}
+-0.424 &amp; -0.574 &amp; 0.513 &amp;  \dots &amp; -0.235 &amp; 0.534 \\
+-0.133 &amp; 0.461 &amp; 0.228 &amp; \dots &amp; -0.151 &amp; 0.193 \\<br>
+\vdots &amp; \vdots &amp; \vdots &amp; \ddots &amp; \vdots &amp; \vdots \\
+0.123 &amp; 0.455 &amp; 0.110 &amp; \dots &amp; -0.121 &amp; 0.489
+\end{pmatrix}
+= \text{positional encoding}\begin{pmatrix}
+\colorbox{red}{ What} \\
+\colorbox{magenta}{ is} \\
+\vdots \\
+\colorbox{cyan}{?}
+\end{pmatrix}
+\end{equation}</p>
+<p>The top row is the first vector output of our positional encoding. The second row is the second, and so on. If we had $n$ tokens in our input sequence, then matrix $M$ would have $n$ rows. The dimensions of $M$ are as follows</p>
+<p>\begin{equation}
+M = \Big( \text{number of tokens in input} \times \text{length of embedding}  \Big) = \Big( n \times d_{\text{model}} \Big).
+\end{equation}</p>
+<h3 id="introduction-to-self-attention">Introduction To Self Attention.</h3>
+<p>At a high level, self-attention aims to evaluate the importance of each element in a sequence with respect to all other elements and use this to compute a representation of the sequence. All it really does is compute a weighted average of input vectors to produce output vectors. Mathematically, for an input sequence of vectors $x = (x_1, \dots ,x_{n})$ it will return some sequence of vectors, $y = (y_1,\dots,y_m)$ such that</p>
+<p>\begin{equation}
+y_i = \sum_{j = 1}^{{n}} w_{ij} \cdot x_j, \text{ } \forall 1 \leq i \leq m.
+\end{equation}</p>
+<p>for some mapping $w_{ij}$. The challenge is in figuring out how we should define our mapping $w_{ij}$. Let&rsquo;s look at the first way $w_{ij}$ was defined, introduced in <a href="https://arxiv.org/pdf/1706.03762.pdf">Attention is All You Need</a>.</p>
+<h3 id="scaled-dot-product-self-attention">Scaled Dot Product Self Attention.</h3>
+<p>To compute scaled dot product self attention, we will use the matrix $M$ with rows corresponding to the positionally encoded vectors. $M$ has dimensions $(n \times d_{\text{model}})$.</p>
+<p>We begin by producing query, key and value matrices, analogous to how a search engine maps a user query to relevant items in its database. We will make 3 copies of our matrix $M$. These become the matrices $Q, K$ and $V$. Each of these has dimension $(n \times d_{\text{model}})$. We let $d_k$ denote the dimensions of the keys, which in this case is $d_{\text{model}}$. We are ready to define attention as</p>
+<p>\begin{equation}
+\text{attention}(Q,K,V) = \mathrm{softmax} \Big( \frac{Q K^T}{\sqrt{d_k}} \Big) V.
+\end{equation}</p>
+<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">attention</span>(Q, K, V):
+</span></span><span style="display:flex;"><span>    dk <span style="color:#f92672">=</span> K<span style="color:#f92672">.</span>size(<span style="color:#f92672">-</span><span style="color:#ae81ff">1</span>)
+</span></span><span style="display:flex;"><span>    scores <span style="color:#f92672">=</span> torch<span style="color:#f92672">.</span>matmul(Q, K<span style="color:#f92672">.</span>transpose(<span style="color:#f92672">-</span><span style="color:#ae81ff">2</span>, <span style="color:#f92672">-</span><span style="color:#ae81ff">1</span>)) <span style="color:#f92672">/</span> torch<span style="color:#f92672">.</span>sqrt(dk)
+</span></span><span style="display:flex;"><span>    attn_weights <span style="color:#f92672">=</span> torch<span style="color:#f92672">.</span>nn<span style="color:#f92672">.</span>functional<span style="color:#f92672">.</span>softmax(scores, dim<span style="color:#f92672">=-</span><span style="color:#ae81ff">1</span>)
+</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> torch<span style="color:#f92672">.</span>matmul(attn_weights, V)
+</span></span></code></pre></div><p>Our matrix $QK^T$ of dimension $(n \times d_{\text{model}}) \times (n \times d_{\text{model}})^T = (n \times n)$. After we re-scale by $\sqrt{d_k}$, this matrix is referred to as the <em>attention matrix</em>.</p>
+<p><strong>Why do we divide by $\sqrt{d_k}?$</strong> This was introduced to counteract the effect of having the dot products grow large in magnitude for large dimensional inputs $d_k&raquo;1$. In cases where the dot product grew large in size, it was suspect suspected that application of the softmax function was returning extremely small gradients which in turn lead to the vanishing gradients problem.</p>
+<p>We multiply the softmax of the attention matrix with each row of $V$. This re-scales each row of the output matrix to sum to one. The equation for softmax applied to a matrix $X$ is as follows</p>
+<p>\begin{equation}
+\text{softmax}(X)<em>{ij} = \frac{e^{X</em>{ij}}}{\sum_{k=1}^{n} e^{X_{ik}}}.
+\end{equation}</p>
+<div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">softmax</span>(X):
+</span></span><span style="display:flex;"><span>    exp_X <span style="color:#f92672">=</span> torch<span style="color:#f92672">.</span>exp(X)
+</span></span><span style="display:flex;"><span>    denom <span style="color:#f92672">=</span> exp_X<span style="color:#f92672">.</span>sum(dim<span style="color:#f92672">=-</span><span style="color:#ae81ff">1</span>, keepdim<span style="color:#f92672">=</span><span style="color:#66d9ef">True</span>)
+</span></span><span style="display:flex;"><span>    <span style="color:#66d9ef">return</span> exp_X <span style="color:#f92672">/</span> denom
+</span></span></code></pre></div><p><strong>Why use softmax?</strong> The dot product of $Q$ and $K^T$ gives us a value anywhere between negative and positive infinity. Application of softmax ensures our outputs are more stable. Otherwise, large elements in $Q$ or $K^T$ would grow even larger, dominating the attention mechanism which may cause convergence issues.</p>
+<p>Earlier on in we described attention as</p>
+<p>\begin{equation}
+y_i = \sum_{j = 1}^{{n}} w_{ij} \cdot x_j, \qquad \forall 1 \leq i \leq m.
+\end{equation}</p>
+<p>Well, our <em>attention matrix</em> after softmax has been applied is simply $w$ with $(i,j)th$ element $w_{ij}$. The output $y_i$ is just the weighted sum using $w$ on the value vectors, $v = (\vec{v}_1,\dots,\vec{v}_n)$. It may be clearer to visualize the output as</p>
+<p>[
+\vec{y} = \begin{pmatrix}
+w_{11} &amp; w_{12} &amp; \dots &amp; w_{1n} \\
+w_{21} &amp; w_{22} &amp; \dots &amp; w_{2n} \\
+\vdots &amp; \vdots &amp; \ddots &amp; \vdots \\
+w_{n1} &amp; w_{n2} &amp; \dots &amp; w_{nn}
+\end{pmatrix} \times \begin{pmatrix}
+v_1 \\ v_2 \\ \vdots \\ v_n
+\end{pmatrix}
+]</p>
+<p>The attention matrix is a nice thing to visualize. For our toy example, it might look like</p>
+<!-- raw HTML omitted -->
+<p>What can we notice about our attention matrix?</p>
+<p><strong>It is symmetric.</strong>
+That is, $w = w^T$. This is to be expected, as remember it was produced by computing $QK^T$ where $Q$ and $K$ are identical.</p>
+<p><strong>The largest values are often times found on the leading diagonal.</strong>
+You can think of the values in the matrix as some measure of how important one token is to another. Typically, we try to ensure that each token pays attention to itself to some extent.</p>
+<p><strong>Every cell is filled.</strong>
+This is because in this attention approach, every token attends to every other token. This is often referred to as <em>full $n^2$ attention</em>.</p>
+<h4 id="multi-head-self-attention">Multi Head Self Attention.</h4>
+<p>It&rsquo;s important to acknowledge that there may not exist a single perfect representation of the attention matrix. Multi Head Self Attention allows us to produce many different representations of the attention matrix. Each individual attention mechanism is referred to as a ``head&quot;. Each head learns slightly different representations of the input sequence, which the original researchers found prompted the best output. Firstly, we&rsquo;re going to introduce some new matrices. These will be defined as</p>
+<p>\begin{align*}
+Q = (n \times d_q), \hspace{3mm} K = (n \times d_k), \hspace{3mm} V = (n \times d_v)
+\end{align*}</p>
+<p>These matrices will be obtained by linearly transforming the original matrix $M$, using weight matrices $W^Q$, $W^K$ and $W^V$ respectively:
+\begin{align*}
+Q &amp;= M\times W^Q, \\
+K &amp;= M \times W^K, \\
+V &amp;= M \times W^V.
+\end{align*}</p>
+<p>Each of these matrices has $d_{\text{model}}$ rows, and remember that $M$ has $d_{\text{model}}$ columns. We have control over parameters $d_q, d_k, d_v$. In the original research they took $d_q = d_k =  d_v = d_{\text{model}}/8 = 64$.</p>
+<p>We&rsquo;re going to use a different set of weight matrices $W^Q$, $W^K$ and $W^V$ for each head. If we have $H$ heads, we will refer to the set of weight matrices of the $h_{th}$ head as ${ W_h^Q, W_h^K, W_h^V }$. For a given head, $h$, the output of the attention mechanism is</p>
+<p>\begin{equation}
+h_i = \text{attention}(M \cdot W_h^Q, M\cdot W_h^K,M\cdot W_h^V)
+\end{equation}</p>
+<p>The overall output of the process is then simply</p>
+<p>\begin{equation}
+\text{MultiHead}(Q,K,V) = \text{Concat}(\text{head}_1, \cdots, \text{head}_H)W^O.<br>
+\end{equation}</p>
+<p>Concat() simply concatenates our output matrices. The output matrix of size $(n \times d_v)$ for each head is simply our matrices stacked on top of one another like so</p>
+<p>\begin{equation*}
+\text{Concat}(\text{head}<em>1, \dots, \text{head}<em>h) =
+\begin{pmatrix}
+\text{head}</em>{1</em>{11}} &amp; \dots &amp; \text{head}<em>{1</em>{1d_v}} &amp; \dots &amp; \text{head}<em>{H</em>{11}} &amp; \dots &amp; \text{head}<em>{H</em>{1d_v}} \\
+\text{head}<em>{1</em>{21}} &amp; \dots &amp; \text{head}<em>{1</em>{2d_v}} &amp; \dots &amp; \text{head}<em>{H</em>{21}} &amp; \dots &amp; \text{head}<em>{H</em>{2d_v}} \\
+\vdots &amp; \ddots &amp; \vdots &amp; \dots &amp; \vdots &amp; \ddots &amp; \vdots \\
+\text{head}<em>{1</em>{n1}} &amp; \dots &amp; \text{head}<em>{1</em>{nd_v}} &amp; \dots &amp; \text{head}<em>{H</em>{n1}} &amp; \dots &amp; \text{head}<em>{H</em>{nd_v}} \\
+\end{pmatrix}
+\end{equation*}</p>
+<p>This output has dimension $(n \times H d_v)$. We still have $n$ rows, however now we have $h$ different representations of $d_v$. Our output, $W^O$, is another trainable weight matrix which has dimensions $W^O = (Hd_v \times d_{\text{model}})$. Therefore, the multiplication of Concat $(\text{head}_1, \dots, \text{head}<em>H)$ and $W^O$ results in a matrix with dimension $(n \times d</em>{\text{model}})$.</p>
+
+		</section>
+
+		<div class="post-tags">
+			
+			
+			<nav class="nav tags">
+				<ul class="tags">
+					
+					<li><a href="/tags/attention">attention</a></li>
+					
+				</ul>
+			</nav>
+			
+			
+		</div>
+		</article>
+</main>
+<footer>
+  <div style="display:flex"></div>
+  <div class="footer-info">
+    2024  <a
+      href="https://github.com/athul/archie">Archie Theme</a> | Built with <a href="https://gohugo.io">Hugo</a>
+  </div>
+</footer>
+
+
+</div>
+    </body>
+</html>
diff --git a/public/posts/kv_cache/index.html b/public/posts/kv_cache/index.html
new file mode 100644
index 0000000..e7a2e11
--- /dev/null
+++ b/public/posts/kv_cache/index.html
@@ -0,0 +1,142 @@
+<!DOCTYPE html>
+<html><head lang="en">
+	<meta charset="utf-8" />
+	<meta http-equiv="X-UA-Compatible" content="IE=edge"><title>The KV Cache - Jonah&#39;s ML Notes</title><meta name="viewport" content="width=device-width, initial-scale=1">
+	<meta name="description" content="Computing the attention more efficiently at inference." />
+	<meta property="og:image" content=""/>
+	<meta property="og:title" content="The KV Cache" />
+<meta property="og:description" content="Computing the attention more efficiently at inference." />
+<meta property="og:type" content="article" />
+<meta property="og:url" content="https://www.jonahramponi.com/posts/kv_cache/" /><meta property="article:section" content="posts" />
+<meta property="article:published_time" content="2024-03-22T00:00:00+00:00" />
+<meta property="article:modified_time" content="2024-03-22T00:00:00+00:00" />
+<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="The KV Cache"/>
+<meta name="twitter:description" content="Computing the attention more efficiently at inference."/>
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
+	
+        <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
+	
+
+	
+	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
+
+	
+	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
+
+	
+	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
+	
+	
+</head>
+<body>
+        <div class="content"><header>
+	<div class="main">
+		<a href="https://www.jonahramponi.com/">Jonah&#39;s ML Notes</a>
+	</div>
+	<nav>
+		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
+		
+	</nav>
+</header>
+
+<main>
+	<article>
+		<div class="title">
+			<h1 class="title">The KV Cache</h1>
+			<div class="meta">Posted on Mar 22, 2024</div>
+		</div>
+		
+		<div class="tldr">
+			<strong>tl;dr:</strong>
+			Computing the attention more efficiently at inference.
+		</div>
+
+		<section class="body">
+			<p>The computation of attention is costly. Remember that our decoder works in an auto-regressive fashion. For our given input <em>$$\colorbox{red}{What}\colorbox{magenta}{ is}\colorbox{green}{ the}\colorbox{orange}{ capital}\colorbox{purple}{ of}\colorbox{brown}{ France}\colorbox{cyan}{?}&quot;$$</em></p>
+<p>\begin{align}
+\text{Prediction 1} &amp;= \colorbox{orange}{The} \\
+\text{Prediction 2} &amp;= \colorbox{orange}{The}\colorbox{pink}{ capital} \\
+&amp;\vdots \\
+\text{Prediction $p$} &amp;= \colorbox{orange}{The}\colorbox{pink}{ capital} (\dots) \colorbox{red}{ Paris.}
+\end{align}</p>
+<p>To produce prediction $2$, we will take the output from prediction $1$. At each step, the model will also see our input sequence. Without any tricks, at every step, we&rsquo;re going to be re-computing values that have already been calculated. Our attention matrix used for our first prediction will have the following structure</p>
+<p><img src="/img/first_pred_kv.png" alt="Sliding Window Attention Visual"></p>
+<p>When we compute the second prediction, the structure of our attention matrix looks very similar. Notice that the attention matrix after prediction one is actually contained within this matrix!</p>
+<p><img src="/img/second_pred_kv.png" alt="Dilated Sliding Window Attention Visual"></p>
+<p>Remember, $Q$ and $K^T$ are just defined by our matrix $M$ which contains one row per input token. Thus, $Q$ and $K^T$ are very similar between the first and second predictions - only one row / column has changed! By caching $K$ for each prediction, we can make the computation of our attention matrix more efficient and by caching $V$, we make our attention mechanism output calculation more efficient.</p>
+
+		</section>
+
+		<div class="post-tags">
+			
+			
+			<nav class="nav tags">
+				<ul class="tags">
+					
+					<li><a href="/tags/attention">attention</a></li>
+					
+					<li><a href="/tags/inference">inference</a></li>
+					
+				</ul>
+			</nav>
+			
+			
+		</div>
+		</article>
+</main>
+<footer>
+  <div style="display:flex"></div>
+  <div class="footer-info">
+    2024  <a
+      href="https://github.com/athul/archie">Archie Theme</a> | Built with <a href="https://gohugo.io">Hugo</a>
+  </div>
+</footer>
+
+
+</div>
+    </body>
+</html>
diff --git a/public/posts/mqa_gqa/index.html b/public/posts/mqa_gqa/index.html
new file mode 100644
index 0000000..1015587
--- /dev/null
+++ b/public/posts/mqa_gqa/index.html
@@ -0,0 +1,155 @@
+<!DOCTYPE html>
+<html><head lang="en">
+	<meta charset="utf-8" />
+	<meta http-equiv="X-UA-Compatible" content="IE=edge"><title>Multi &amp; Grouped Query Attention - Jonah&#39;s ML Notes</title><meta name="viewport" content="width=device-width, initial-scale=1">
+	<meta name="description" content="Use less K and V matrices to use less memory." />
+	<meta property="og:image" content=""/>
+	<meta property="og:title" content="Multi &amp; Grouped Query Attention" />
+<meta property="og:description" content="Use less K and V matrices to use less memory." />
+<meta property="og:type" content="article" />
+<meta property="og:url" content="https://www.jonahramponi.com/posts/mqa_gqa/" /><meta property="article:section" content="posts" />
+<meta property="article:published_time" content="2024-03-22T00:00:00+00:00" />
+<meta property="article:modified_time" content="2024-03-22T00:00:00+00:00" />
+<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Multi &amp; Grouped Query Attention"/>
+<meta name="twitter:description" content="Use less K and V matrices to use less memory."/>
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
+	
+        <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
+	
+
+	
+	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
+
+	
+	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
+
+	
+	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
+	
+	
+</head>
+<body>
+        <div class="content"><header>
+	<div class="main">
+		<a href="https://www.jonahramponi.com/">Jonah&#39;s ML Notes</a>
+	</div>
+	<nav>
+		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
+		
+	</nav>
+</header>
+
+<main>
+	<article>
+		<div class="title">
+			<h1 class="title">Multi &amp; Grouped Query Attention</h1>
+			<div class="meta">Posted on Mar 22, 2024</div>
+		</div>
+		
+		<div class="tldr">
+			<strong>tl;dr:</strong>
+			Use less K and V matrices to use less memory.
+		</div>
+
+		<section class="body">
+			<h4 id="multi-query-attention">Multi Query Attention</h4>
+<p><a href="https://arxiv.org/pdf/1911.02150v1.pdf"><em>Multi Query Attention</em></a> (MQA) using the same $K$ and $V$ matrices for each head in our multi head self attention mechanism. For a given head, $h$, $1 \leq h \leq H$, the attention mechanism is calculated as</p>
+<p>\begin{equation}
+h_i = \text{attention}(M\cdot W_h^Q, M \cdot W^K,M \cdot W^V).
+\end{equation}</p>
+<p>For each of our $H$ heads, the only difference in the weight matrices is in $W_h^Q$. Each of these $W_h$ has dimension $(n \times d_q)$. The attention output for each head $i$ is given by</p>
+<p>\begin{equation}
+\text{attention}(Q_h,K,V) = \text{softmax}\Big(\frac{Q_h \cdot K^T}{\sqrt{d_k}} \Big) \cdot V
+\end{equation}</p>
+<p>As before, we simply concatenate our attention outputs and multiply by $W^O$, which is defined as before.</p>
+<h4 id="grouped-query-attention">Grouped Query Attention</h4>
+<p><a href="https://arxiv.org/pdf/2305.13245v3.pdf"><em>Grouped Query Attention</em></a> (GQA) is very similar to MQA. The difference is that instead of using just one set of $K$, $V$ values for attention calculations it uses $G$ different sets of $K,V$ values. If we have $H$ heads, GQA is equivalent to MHA if $G=H$ and equivalent to MQA if $G=1$. Suppose we want to use $G$ groups. We would firstly allocate each of our $H$ heads into one of the $G$ groups. It would likely make sense to pick $G$ such that $G \mod H \equiv 0$. Though this is not a requirement.</p>
+<p>For each  head in a given group, we calculate attention outputs as
+\begin{align}
+\text{attention}({h}) &amp;= \text{attention}(M\cdot W_h^Q, M \cdot W^K_g,M \cdot W^V_g) \\
+&amp;= \text{softmax}\Big(\frac{Q_h \cdot K^T_g}{\sqrt{d_k}} \Big) \cdot V_g
+\end{align}</p>
+<p>The query matrices will be shared by all groups under a given head, and the key and value matrices will be used for all attention calculations within a given group.</p>
+<p><strong>Conversions from Multi Head Attention.</strong> A natural question might be how one could take a model which uses multi-head attention and convert it to model using multi query attention or grouped query attention. To convert to multi query attention, we want to find a single representative matrix for both $K$ and $V$ from our set of $H$ different heads. We achieve this via mean pooling. For instance for $K$,</p>
+<p>\begin{equation}
+\text{mean pooling}(K_1,\dots,K_h) \rightarrow K&rsquo;.
+\end{equation}</p>
+<p>We need to decide the size of our mean pooling window, $w$. Our process then involves three steps.</p>
+<p>Firstly, divide each of the input matrices $(K_1,\dots,K_H)$ into non-overlapping $w \times w$ regions. Then compute the average value within each $w \times w$ region for each input matrix $(K_1,\dots,K_H)$. Finally compute the mean of the corresponding regions across all $H$ input matrices and set this to the corresponding values in our final matrix $K&rsquo;$.</p>
+<p>We now have our matrix $K&rsquo;$. It is required at this stage to pre-train for a small portion of the original training steps. The process is nearly identical for grouped query attention. However this time we mean pool over each group of matrices (instead of the whole set). The matrices within a given group are simply dictated by how we chose to assign our $G$ groups to the original $H$ heads.</p>
+
+		</section>
+
+		<div class="post-tags">
+			
+			
+			<nav class="nav tags">
+				<ul class="tags">
+					
+					<li><a href="/tags/attention">attention</a></li>
+					
+					<li><a href="/tags/attention-matrix">attention matrix</a></li>
+					
+				</ul>
+			</nav>
+			
+			
+		</div>
+		</article>
+</main>
+<footer>
+  <div style="display:flex"></div>
+  <div class="footer-info">
+    2024  <a
+      href="https://github.com/athul/archie">Archie Theme</a> | Built with <a href="https://gohugo.io">Hugo</a>
+  </div>
+</footer>
+
+
+</div>
+    </body>
+</html>
diff --git a/public/posts/resources/index.html b/public/posts/resources/index.html
new file mode 100644
index 0000000..dc7b54e
--- /dev/null
+++ b/public/posts/resources/index.html
@@ -0,0 +1,118 @@
+<!DOCTYPE html>
+<html><head lang="en">
+	<meta charset="utf-8" />
+	<meta http-equiv="X-UA-Compatible" content="IE=edge"><title>PDFs and Resources - Jonah&#39;s ML Notes</title><meta name="viewport" content="width=device-width, initial-scale=1">
+	<meta name="description" content="The contents of this website can be found as a pdf here." />
+	<meta property="og:image" content=""/>
+	<meta property="og:title" content="PDFs and Resources" />
+<meta property="og:description" content="The contents of this website can be found as a pdf here." />
+<meta property="og:type" content="article" />
+<meta property="og:url" content="https://www.jonahramponi.com/posts/resources/" /><meta property="article:section" content="posts" />
+<meta property="article:published_time" content="2024-02-28T11:49:13+00:00" />
+<meta property="article:modified_time" content="2024-02-28T11:49:13+00:00" />
+<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="PDFs and Resources"/>
+<meta name="twitter:description" content="The contents of this website can be found as a pdf here."/>
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
+	
+        <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
+	
+
+	
+	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
+
+	
+	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
+
+	
+	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
+	
+	
+</head>
+<body>
+        <div class="content"><header>
+	<div class="main">
+		<a href="https://www.jonahramponi.com/">Jonah&#39;s ML Notes</a>
+	</div>
+	<nav>
+		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
+		
+	</nav>
+</header>
+
+<main>
+	<article>
+		<div class="title">
+			<h1 class="title">PDFs and Resources</h1>
+			<div class="meta">Posted on Feb 28, 2024</div>
+		</div>
+		
+
+		<section class="body">
+			<p>The contents of this website can be found as a <a href="/posts/file/Attention_Mechanisms.pdf">pdf here</a>.</p>
+<!-- raw HTML omitted -->
+
+		</section>
+
+		<div class="post-tags">
+			
+			
+			
+		</div>
+		</article>
+</main>
+<footer>
+  <div style="display:flex"></div>
+  <div class="footer-info">
+    2024  <a
+      href="https://github.com/athul/archie">Archie Theme</a> | Built with <a href="https://gohugo.io">Hugo</a>
+  </div>
+</footer>
+
+
+</div>
+    </body>
+</html>
diff --git a/public/posts/sliding_window_attention/index.html b/public/posts/sliding_window_attention/index.html
new file mode 100644
index 0000000..2b8460a
--- /dev/null
+++ b/public/posts/sliding_window_attention/index.html
@@ -0,0 +1,156 @@
+<!DOCTYPE html>
+<html><head lang="en">
+	<meta charset="utf-8" />
+	<meta http-equiv="X-UA-Compatible" content="IE=edge"><title>Sliding Window Attention - Jonah&#39;s ML Notes</title><meta name="viewport" content="width=device-width, initial-scale=1">
+	<meta name="description" content="Altering the tokens to which a token in the input sequence attends." />
+	<meta property="og:image" content=""/>
+	<meta property="og:title" content="Sliding Window Attention" />
+<meta property="og:description" content="Altering the tokens to which a token in the input sequence attends." />
+<meta property="og:type" content="article" />
+<meta property="og:url" content="https://www.jonahramponi.com/posts/sliding_window_attention/" /><meta property="article:section" content="posts" />
+<meta property="article:published_time" content="2024-03-22T00:00:00+00:00" />
+<meta property="article:modified_time" content="2024-03-22T00:00:00+00:00" />
+<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Sliding Window Attention"/>
+<meta name="twitter:description" content="Altering the tokens to which a token in the input sequence attends."/>
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
+	
+        <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
+	
+
+	
+	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
+
+	
+	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
+
+	
+	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
+	
+	
+</head>
+<body>
+        <div class="content"><header>
+	<div class="main">
+		<a href="https://www.jonahramponi.com/">Jonah&#39;s ML Notes</a>
+	</div>
+	<nav>
+		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
+		
+	</nav>
+</header>
+
+<main>
+	<article>
+		<div class="title">
+			<h1 class="title">Sliding Window Attention</h1>
+			<div class="meta">Posted on Mar 22, 2024</div>
+		</div>
+		
+		<div class="tldr">
+			<strong>tl;dr:</strong>
+			Altering the tokens to which a token in the input sequence attends.
+		</div>
+
+		<section class="body">
+			<p><a href="https://arxiv.org/pdf/2004.05150.pdf"><em>Sliding Window Attention</em></a> reduces the number of calculations we are doing when computing self attention. Previously, to compute attention we took our input matrix of positional encodings $M$, and made copies named $Q, K$ and $V$. We used these copies to compute</p>
+<p>\begin{equation}
+\text{attention}(Q,K,V) = \text{softmax}\Big(\frac{Q K^T}{\sqrt{d_k}}\Big) V.
+\end{equation}</p>
+<p>For now, let&rsquo;s ignore the re-scaling by $\sqrt{d_k}$ and just look at the computation of $QK^T$. This computation looks like
+\begin{equation}
+Q \times K^T = \begin{pmatrix}
+Q_{11} &amp; Q_{12} &amp; \cdots &amp; Q_{1d} \\
+\vdots &amp; \ddots &amp; \cdots &amp; \vdots \\
+Q_{n1} &amp; Q_{n2} &amp; \cdots &amp; Q_{nd}
+\end{pmatrix} \times
+\begin{pmatrix}
+K_{11} &amp; K_{21} &amp; \cdots &amp; K_{n1} \\
+\vdots &amp; \ddots &amp; \cdots &amp; \vdots \\
+K_{1d} &amp; K_{2d} &amp; \cdots &amp; K_{nd}
+\end{pmatrix}
+\end{equation}</p>
+<p>Our goal is to simplify this computation. Instead of letting each token attend to all of the other tokens, we will define a window size $w$. The token we are calculating attention values for will then only get to look at the tokens $\frac{1}{2}w$ either side of it. For our example, we could consider a sliding window of size $2$ which will look $1$ token to either side of the current token. Only the values shaded in $\colorbox{olive}{olive}$ will be calculated.</p>
+<p><img src="/img/sliding_window.png" alt="Sliding Window Attention Matrix"></p>
+<p>This greatly reduces the cost of the computation of $Q \times K^T$, however, the original authors encountered a problem in training. The authors found that this approach is not flexible enough to learn to complete specific tasks. They solved this problem through the introduction of <em>global attention</em>. This will give a few of our tokens some special properties: A token with a global attention attends to all other tokens in the sequence and all tokens in the sequence attend to every token with a global attention.</p>
+<p>The local attention (sliding window attention) is primarily used to build contextual representations, while the global attention allows the model to build full sequence representations for prediction.</p>
+<p>We will require two sets of our projection matrices. Firstly, projections to compute attention scores for our sliding window approach ${Q_s, K_s, V_s}$ and secondly attention scores for the global attention ${Q_g,K_g,V_g}$. These are initialized to the same values.</p>
+<p>We first calculate local attention weights using ${Q_s,K_s,V_s}$. This gives us an attention output, which is then combined with the output using the global attention weights. The global weights are written on top of the output attention weight matrix calculated by the local attention calculation.</p>
+<h4 id="dilated-sliding-window-attention">Dilated Sliding Window Attention.</h4>
+<p>This is another approach to achieve a similar result. This time, instead of simply taking the $\frac{1}{2}w$ tokens either side of a given $w$ we will introduce some gaps of size $d$. This is referred to as the dilation. Using $w=2, d=1$ in our example we would have an attention matrix which looks like</p>
+<p><img src="/img/dilated_sliding_window.png" alt="Dilated Sliding Window Attention Matrix"></p>
+<p>The authors provide a nice visual of how this looks generally, which you can see in the image below. The authors note they use dilated sliding window attention with small window sizes for lower layers, and larger window sizes for higher layers. They do not introduce dilation for lower layers, however for higher layers a small amount of increasing dilation was introduced on $2$ heads.</p>
+<p><img src="/img/longformer.png" alt="Attention Matrix Visualizations from the Longformer Paper"></p>
+
+		</section>
+
+		<div class="post-tags">
+			
+			
+			<nav class="nav tags">
+				<ul class="tags">
+					
+					<li><a href="/tags/attention">attention</a></li>
+					
+				</ul>
+			</nav>
+			
+			
+		</div>
+		</article>
+</main>
+<footer>
+  <div style="display:flex"></div>
+  <div class="footer-info">
+    2024  <a
+      href="https://github.com/athul/archie">Archie Theme</a> | Built with <a href="https://gohugo.io">Hugo</a>
+  </div>
+</footer>
+
+
+</div>
+    </body>
+</html>
diff --git a/public/posts/sparse_attention/index.html b/public/posts/sparse_attention/index.html
new file mode 100644
index 0000000..e9c68e5
--- /dev/null
+++ b/public/posts/sparse_attention/index.html
@@ -0,0 +1,153 @@
+<!DOCTYPE html>
+<html><head lang="en">
+	<meta charset="utf-8" />
+	<meta http-equiv="X-UA-Compatible" content="IE=edge"><title>Sparse Attention - Jonah&#39;s ML Notes</title><meta name="viewport" content="width=device-width, initial-scale=1">
+	<meta name="description" content="Reducing the number of calculations to compute attention." />
+	<meta property="og:image" content=""/>
+	<meta property="og:title" content="Sparse Attention" />
+<meta property="og:description" content="Reducing the number of calculations to compute attention." />
+<meta property="og:type" content="article" />
+<meta property="og:url" content="https://www.jonahramponi.com/posts/sparse_attention/" /><meta property="article:section" content="posts" />
+<meta property="article:published_time" content="2024-03-22T00:00:00+00:00" />
+<meta property="article:modified_time" content="2024-03-22T00:00:00+00:00" />
+<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Sparse Attention"/>
+<meta name="twitter:description" content="Reducing the number of calculations to compute attention."/>
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
+	
+        <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
+	
+
+	
+	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
+
+	
+	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
+
+	
+	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
+	
+	
+</head>
+<body>
+        <div class="content"><header>
+	<div class="main">
+		<a href="https://www.jonahramponi.com/">Jonah&#39;s ML Notes</a>
+	</div>
+	<nav>
+		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
+		
+	</nav>
+</header>
+
+<main>
+	<article>
+		<div class="title">
+			<h1 class="title">Sparse Attention</h1>
+			<div class="meta">Posted on Mar 22, 2024</div>
+		</div>
+		
+		<div class="tldr">
+			<strong>tl;dr:</strong>
+			Reducing the number of calculations to compute attention.
+		</div>
+
+		<section class="body">
+			<p><a href="https://arxiv.org/pdf/1904.10509v1.pdf"><em>Sparse Attention</em></a> introduces sparse factorizations on the attention matrix. To implement this we introduce a <em>connectivity pattern</em> $S = {S_1,\dots,S_n}$. Here, $S_i$ denotes the set of indices of the input vectors to which the $i$th output vector attends. For instance, in regular $n^2$ attention every input vector attends to every output vector before it in the sequence. Remember that $d_k$ is the inner dimension of our queries and keys. Sparse Attention is given as follows</p>
+<p>\begin{equation*}
+\text{attention}(Q,K,V, S_i) = \text{softmax}\Big( \frac{(Q_{S_i}) K^T_{S_i}}{\sqrt{d_k}} \Big) V_{S_i}.
+\end{equation*}</p>
+<p>Here, we have defined</p>
+<p>$$ Q_{S_i} = (W_q x_j), K_{S_i} = (W_k x_j), V_{S_i} = (W_v x_j) \text{ for } j \in S_i $$</p>
+<p>So how do we define the set of connectivity patterns $S$? Formally, we let $S_i = A_i^{h}$ for head $h$ where $A_i^{h} \subset {j : j \leq i}$. It is still no clearer how we pick which indices we should take for a given $S_i$. The original authors consider two key criteria initially:</p>
+<p><strong>Criteria 1</strong>
+We should pick $|A_i^h| \propto n^{1/H}$ where $H$ is our total number of heads. This choice is efficient as it ensures the size of the connectivity set scales well with $H$.</p>
+<p><strong>Criteria 2</strong>
+All input positions are connected to output positions across $p$ steps of attention. For instance, for a pair $j \leq i$ we would like $i$ to be able to attend to $j$ through a path of locations with maximum length $p+1$. This helps us propagate signals from input to output in a constant number of steps.</p>
+<p>We now investigate two different approaches that satisfy this criteria, and allow us to implement sparse attention.</p>
+<h4 id="strided-attention">Strided Attention.</h4>
+<p>We will define a factorized attention pattern in two heads. One head will attend to the previous $l$ locations, while the other head will attend to every $l$th location. We call $l$ the stride and it is chosen to be close to $\sqrt{n}$.</p>
+<p>\begin{align}
+A_i^{(1)} &amp;= {y,y+1,\dots,i} \text{ for } t = \max(0,i-l), \\
+A_i^{(2)} &amp;= {j: (i-j)\mod l \equiv 0}.
+\end{align}</p>
+<p>Here, $A_i^{(1)}$ simply takes the previous $l$ locations. $A_i^{(2)}$ then takes every $l$th head from the first head where $i-j$ was divisible by $l$ without remainder. This is particularly useful where you can align the structure of your input with the stride. For instance, with a piece of music. Where our input does not have a well defined structured, we use something different. In the image below, you can see $A_i^{(1)}$ responsible for the dark blue shading and $A_i^{(2)}$ responsible for the light blue.</p>
+<h4 id="fixed-attention">Fixed Attention.</h4>
+<p>Our goal with this approach is to allow specific cells to summarize the previous locations, and to propagate this information on to future cells.</p>
+<p>$$ A^{(1)}_i = { j : \text{floor}(\frac{j}{l}) = \text{floor}( \frac{i}{l}) }, $$
+$$ A^{(2)}_i = { j : j \mod l \in { t, t + 1, \dots, l } },  \text{ where } t = l - c \text{ and } c \text{ is a hyperparameter.} $$</p>
+<p>These are best understood visually in my opinion. In the image below, $A_i^{(1)}$ is responsible for the dark blue shading and $A_i^{(2)}$ for the light blue shading. If we take stride, $l$ = 128 and $c=8$, then all positions greater than 128 can attend to positions $120-128$. The authors find choosing $c \in {8,16,32}$ worked well.</p>
+<p><img src="/img/sparse_attention.png" alt="Sparse Attention Matrix"></p>
+
+		</section>
+
+		<div class="post-tags">
+			
+			
+			<nav class="nav tags">
+				<ul class="tags">
+					
+					<li><a href="/tags/attention">attention</a></li>
+					
+				</ul>
+			</nav>
+			
+			
+		</div>
+		</article>
+</main>
+<footer>
+  <div style="display:flex"></div>
+  <div class="footer-info">
+    2024  <a
+      href="https://github.com/athul/archie">Archie Theme</a> | Built with <a href="https://gohugo.io">Hugo</a>
+  </div>
+</footer>
+
+
+</div>
+    </body>
+</html>
diff --git a/public/robots.txt b/public/robots.txt
new file mode 100644
index 0000000..6f27bb6
--- /dev/null
+++ b/public/robots.txt
@@ -0,0 +1,2 @@
+User-agent: *
+Disallow:
\ No newline at end of file
diff --git a/public/sitemap.xml b/public/sitemap.xml
index ccd602f..6a39558 100644
--- a/public/sitemap.xml
+++ b/public/sitemap.xml
@@ -2,20 +2,47 @@
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
   xmlns:xhtml="http://www.w3.org/1999/xhtml">
   <url>
-    <loc>https://www.jonahramponi.com/</loc>
-    <lastmod>2024-03-30T11:49:13+00:00</lastmod>
+    <loc>https://www.jonahramponi.com/tags/attention/</loc>
+    <lastmod>2024-03-30T00:00:00+00:00</lastmod>
+  </url><url>
+    <loc>https://www.jonahramponi.com/posts/intro_to_attention/</loc>
+    <lastmod>2024-03-30T00:00:00+00:00</lastmod>
   </url><url>
-    <loc>https://www.jonahramponi.com/posts/test-copy/</loc>
-    <lastmod>2024-03-30T11:49:13+00:00</lastmod>
+    <loc>https://www.jonahramponi.com/</loc>
+    <lastmod>2024-03-30T00:00:00+00:00</lastmod>
   </url><url>
     <loc>https://www.jonahramponi.com/posts/</loc>
-    <lastmod>2024-03-30T11:49:13+00:00</lastmod>
+    <lastmod>2024-03-30T00:00:00+00:00</lastmod>
+  </url><url>
+    <loc>https://www.jonahramponi.com/tags/</loc>
+    <lastmod>2024-03-30T00:00:00+00:00</lastmod>
   </url><url>
-    <loc>https://www.jonahramponi.com/posts/test/</loc>
-    <lastmod>2024-03-30T11:49:13+00:00</lastmod>
+    <loc>https://www.jonahramponi.com/posts/flash_attention/</loc>
+    <lastmod>2024-03-26T00:00:00+00:00</lastmod>
   </url><url>
-    <loc>https://www.jonahramponi.com/categories/</loc>
+    <loc>https://www.jonahramponi.com/tags/inference/</loc>
+    <lastmod>2024-03-26T00:00:00+00:00</lastmod>
   </url><url>
-    <loc>https://www.jonahramponi.com/tags/</loc>
+    <loc>https://www.jonahramponi.com/tags/attention-matrix/</loc>
+    <lastmod>2024-03-22T00:00:00+00:00</lastmod>
+  </url><url>
+    <loc>https://www.jonahramponi.com/posts/mqa_gqa/</loc>
+    <lastmod>2024-03-22T00:00:00+00:00</lastmod>
+  </url><url>
+    <loc>https://www.jonahramponi.com/posts/sliding_window_attention/</loc>
+    <lastmod>2024-03-22T00:00:00+00:00</lastmod>
+  </url><url>
+    <loc>https://www.jonahramponi.com/posts/sparse_attention/</loc>
+    <lastmod>2024-03-22T00:00:00+00:00</lastmod>
+  </url><url>
+    <loc>https://www.jonahramponi.com/posts/kv_cache/</loc>
+    <lastmod>2024-03-22T00:00:00+00:00</lastmod>
+  </url><url>
+    <loc>https://www.jonahramponi.com/posts/resources/</loc>
+    <lastmod>2024-02-28T11:49:13+00:00</lastmod>
+  </url><url>
+    <loc>https://www.jonahramponi.com/about/</loc>
+  </url><url>
+    <loc>https://www.jonahramponi.com/categories/</loc>
   </url>
 </urlset>
diff --git a/public/tags/attention-matrix/index.html b/public/tags/attention-matrix/index.html
new file mode 100644
index 0000000..e3ed4d1
--- /dev/null
+++ b/public/tags/attention-matrix/index.html
@@ -0,0 +1,103 @@
+<!DOCTYPE html>
+<html><head lang="en">
+	<meta charset="utf-8" />
+	<meta http-equiv="X-UA-Compatible" content="IE=edge"><title>Attention Matrix - Jonah&#39;s ML Notes</title><meta name="viewport" content="width=device-width, initial-scale=1">
+	<meta name="description" content="" />
+	<meta property="og:image" content=""/>
+	<link rel="alternate" type="application/rss+xml" href="https://www.jonahramponi.com/tags/attention-matrix/index.xml" title="Jonah's ML Notes" />
+	<meta property="og:title" content="Attention Matrix" />
+<meta property="og:description" content="" />
+<meta property="og:type" content="website" />
+<meta property="og:url" content="https://www.jonahramponi.com/tags/attention-matrix/" />
+<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Attention Matrix"/>
+<meta name="twitter:description" content=""/>
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
+	
+        <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
+	
+
+	
+	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
+
+	
+	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
+
+	
+	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
+	
+	
+</head>
+<body>
+        <div class="content"><header>
+	<div class="main">
+		<a href="https://www.jonahramponi.com/">Jonah&#39;s ML Notes</a>
+	</div>
+	<nav>
+		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
+		
+	</nav>
+</header>
+
+
+<h1>Entries tagged - "attention matrix"</h1>
+
+
+<ul class="posts"><li class="post">
+			<a href="/posts/mqa_gqa/">Multi &amp; Grouped Query Attention</a> <span class="meta">Mar 22, 2024</span>
+		</li></ul>
+<footer>
+  <div style="display:flex"></div>
+  <div class="footer-info">
+    2024  <a
+      href="https://github.com/athul/archie">Archie Theme</a> | Built with <a href="https://gohugo.io">Hugo</a>
+  </div>
+</footer>
+
+
+</div>
+    </body>
+</html>
diff --git a/public/tags/attention-matrix/index.xml b/public/tags/attention-matrix/index.xml
new file mode 100644
index 0000000..f51c785
--- /dev/null
+++ b/public/tags/attention-matrix/index.xml
@@ -0,0 +1,19 @@
+<?xml version="1.0" encoding="utf-8" standalone="yes"?>
+<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
+  <channel>
+    <title>Attention Matrix on Jonah&#39;s ML Notes</title>
+    <link>https://www.jonahramponi.com/tags/attention-matrix/</link>
+    <description>Recent content in Attention Matrix on Jonah&#39;s ML Notes</description>
+    <generator>Hugo -- gohugo.io</generator>
+    <language>en-us</language>
+    <lastBuildDate>Fri, 22 Mar 2024 00:00:00 +0000</lastBuildDate>
+    <atom:link href="https://www.jonahramponi.com/tags/attention-matrix/index.xml" rel="self" type="application/rss+xml" />
+    <item>
+      <title>Multi &amp; Grouped Query Attention</title>
+      <link>https://www.jonahramponi.com/posts/mqa_gqa/</link>
+      <pubDate>Fri, 22 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/mqa_gqa/</guid>
+      <description>Multi Query Attention Multi Query Attention (MQA) using the same $K$ and $V$ matrices for each head in our multi head self attention mechanism. For a given head, $h$, $1 \leq h \leq H$, the attention mechanism is calculated as&#xA;\begin{equation} h_i = \text{attention}(M\cdot W_h^Q, M \cdot W^K,M \cdot W^V). \end{equation}&#xA;For each of our $H$ heads, the only difference in the weight matrices is in $W_h^Q$. Each of these $W_h$ has dimension $(n \times d_q)$.</description>
+    </item>
+  </channel>
+</rss>
diff --git a/public/tags/attention/index.html b/public/tags/attention/index.html
new file mode 100644
index 0000000..584b5be
--- /dev/null
+++ b/public/tags/attention/index.html
@@ -0,0 +1,113 @@
+<!DOCTYPE html>
+<html><head lang="en">
+	<meta charset="utf-8" />
+	<meta http-equiv="X-UA-Compatible" content="IE=edge"><title>Attention - Jonah&#39;s ML Notes</title><meta name="viewport" content="width=device-width, initial-scale=1">
+	<meta name="description" content="" />
+	<meta property="og:image" content=""/>
+	<link rel="alternate" type="application/rss+xml" href="https://www.jonahramponi.com/tags/attention/index.xml" title="Jonah's ML Notes" />
+	<meta property="og:title" content="Attention" />
+<meta property="og:description" content="" />
+<meta property="og:type" content="website" />
+<meta property="og:url" content="https://www.jonahramponi.com/tags/attention/" />
+<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Attention"/>
+<meta name="twitter:description" content=""/>
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
+	
+        <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
+	
+
+	
+	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
+
+	
+	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
+
+	
+	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
+	
+	
+</head>
+<body>
+        <div class="content"><header>
+	<div class="main">
+		<a href="https://www.jonahramponi.com/">Jonah&#39;s ML Notes</a>
+	</div>
+	<nav>
+		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
+		
+	</nav>
+</header>
+
+
+<h1>Entries tagged - "attention"</h1>
+
+
+<ul class="posts"><li class="post">
+			<a href="/posts/intro_to_attention/">Intro to Attention</a> <span class="meta">Mar 30, 2024</span>
+		</li><li class="post">
+			<a href="/posts/flash_attention/">Flash Attention</a> <span class="meta">Mar 26, 2024</span>
+		</li><li class="post">
+			<a href="/posts/mqa_gqa/">Multi &amp; Grouped Query Attention</a> <span class="meta">Mar 22, 2024</span>
+		</li><li class="post">
+			<a href="/posts/sliding_window_attention/">Sliding Window Attention</a> <span class="meta">Mar 22, 2024</span>
+		</li><li class="post">
+			<a href="/posts/sparse_attention/">Sparse Attention</a> <span class="meta">Mar 22, 2024</span>
+		</li><li class="post">
+			<a href="/posts/kv_cache/">The KV Cache</a> <span class="meta">Mar 22, 2024</span>
+		</li></ul>
+<footer>
+  <div style="display:flex"></div>
+  <div class="footer-info">
+    2024  <a
+      href="https://github.com/athul/archie">Archie Theme</a> | Built with <a href="https://gohugo.io">Hugo</a>
+  </div>
+</footer>
+
+
+</div>
+    </body>
+</html>
diff --git a/public/tags/attention/index.xml b/public/tags/attention/index.xml
new file mode 100644
index 0000000..c9a770e
--- /dev/null
+++ b/public/tags/attention/index.xml
@@ -0,0 +1,54 @@
+<?xml version="1.0" encoding="utf-8" standalone="yes"?>
+<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
+  <channel>
+    <title>Attention on Jonah&#39;s ML Notes</title>
+    <link>https://www.jonahramponi.com/tags/attention/</link>
+    <description>Recent content in Attention on Jonah&#39;s ML Notes</description>
+    <generator>Hugo -- gohugo.io</generator>
+    <language>en-us</language>
+    <lastBuildDate>Sat, 30 Mar 2024 00:00:00 +0000</lastBuildDate>
+    <atom:link href="https://www.jonahramponi.com/tags/attention/index.xml" rel="self" type="application/rss+xml" />
+    <item>
+      <title>Intro to Attention</title>
+      <link>https://www.jonahramponi.com/posts/intro_to_attention/</link>
+      <pubDate>Sat, 30 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/intro_to_attention/</guid>
+      <description>Suppose you give an LLM the input&#xA;What is the capital of France?&#xA;The first thing the LLM will do is split this input into tokens. A token is just some combinations of characters. You can see an example of the tokenization outputs for the question below.&#xA;$\colorbox{red}{What}\colorbox{magenta}{ is}\colorbox{green}{ the}\colorbox{orange}{ capital}\colorbox{purple}{ of}\colorbox{brown}{ France}\colorbox{cyan}?$&#xA;(This tokenization was produced using cl100k_base, the tokenizer used in GPT-3.5-turbo and GPT-4.)&#xA;In this example we have $(n = 7)$ tokens.</description>
+    </item>
+    <item>
+      <title>Flash Attention</title>
+      <link>https://www.jonahramponi.com/posts/flash_attention/</link>
+      <pubDate>Tue, 26 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/flash_attention/</guid>
+      <description>The goal of Flash Attention is to compute the attention value with fewer high bandwidth memory read / writes. The approach has since been refined in Flash Attention 2.&#xA;We will split the attention inputs $Q,K,V$ into blocks. Each block will be handled separately, and attention will therefore be computed with respect to each block. With the correct scaling, adding the outputs from each block we will give us the same attention value as we would get by computing everything all together.</description>
+    </item>
+    <item>
+      <title>Multi &amp; Grouped Query Attention</title>
+      <link>https://www.jonahramponi.com/posts/mqa_gqa/</link>
+      <pubDate>Fri, 22 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/mqa_gqa/</guid>
+      <description>Multi Query Attention Multi Query Attention (MQA) using the same $K$ and $V$ matrices for each head in our multi head self attention mechanism. For a given head, $h$, $1 \leq h \leq H$, the attention mechanism is calculated as&#xA;\begin{equation} h_i = \text{attention}(M\cdot W_h^Q, M \cdot W^K,M \cdot W^V). \end{equation}&#xA;For each of our $H$ heads, the only difference in the weight matrices is in $W_h^Q$. Each of these $W_h$ has dimension $(n \times d_q)$.</description>
+    </item>
+    <item>
+      <title>Sliding Window Attention</title>
+      <link>https://www.jonahramponi.com/posts/sliding_window_attention/</link>
+      <pubDate>Fri, 22 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/sliding_window_attention/</guid>
+      <description>Sliding Window Attention reduces the number of calculations we are doing when computing self attention. Previously, to compute attention we took our input matrix of positional encodings $M$, and made copies named $Q, K$ and $V$. We used these copies to compute&#xA;\begin{equation} \text{attention}(Q,K,V) = \text{softmax}\Big(\frac{Q K^T}{\sqrt{d_k}}\Big) V. \end{equation}&#xA;For now, let&amp;rsquo;s ignore the re-scaling by $\sqrt{d_k}$ and just look at the computation of $QK^T$. This computation looks like \begin{equation} Q \times K^T = \begin{pmatrix} Q_{11} &amp;amp; Q_{12} &amp;amp; \cdots &amp;amp; Q_{1d} \\ \vdots &amp;amp; \ddots &amp;amp; \cdots &amp;amp; \vdots \\ Q_{n1} &amp;amp; Q_{n2} &amp;amp; \cdots &amp;amp; Q_{nd} \end{pmatrix} \times \begin{pmatrix} K_{11} &amp;amp; K_{21} &amp;amp; \cdots &amp;amp; K_{n1} \\ \vdots &amp;amp; \ddots &amp;amp; \cdots &amp;amp; \vdots \\ K_{1d} &amp;amp; K_{2d} &amp;amp; \cdots &amp;amp; K_{nd} \end{pmatrix} \end{equation}</description>
+    </item>
+    <item>
+      <title>Sparse Attention</title>
+      <link>https://www.jonahramponi.com/posts/sparse_attention/</link>
+      <pubDate>Fri, 22 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/sparse_attention/</guid>
+      <description>Sparse Attention introduces sparse factorizations on the attention matrix. To implement this we introduce a connectivity pattern $S = {S_1,\dots,S_n}$. Here, $S_i$ denotes the set of indices of the input vectors to which the $i$th output vector attends. For instance, in regular $n^2$ attention every input vector attends to every output vector before it in the sequence. Remember that $d_k$ is the inner dimension of our queries and keys. Sparse Attention is given as follows</description>
+    </item>
+    <item>
+      <title>The KV Cache</title>
+      <link>https://www.jonahramponi.com/posts/kv_cache/</link>
+      <pubDate>Fri, 22 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/kv_cache/</guid>
+      <description>The computation of attention is costly. Remember that our decoder works in an auto-regressive fashion. For our given input $$\colorbox{red}{What}\colorbox{magenta}{ is}\colorbox{green}{ the}\colorbox{orange}{ capital}\colorbox{purple}{ of}\colorbox{brown}{ France}\colorbox{cyan}{?}&amp;quot;$$&#xA;\begin{align} \text{Prediction 1} &amp;amp;= \colorbox{orange}{The} \\ \text{Prediction 2} &amp;amp;= \colorbox{orange}{The}\colorbox{pink}{ capital} \\ &amp;amp;\vdots \\ \text{Prediction $p$} &amp;amp;= \colorbox{orange}{The}\colorbox{pink}{ capital} (\dots) \colorbox{red}{ Paris.} \end{align}&#xA;To produce prediction $2$, we will take the output from prediction $1$. At each step, the model will also see our input sequence.</description>
+    </item>
+  </channel>
+</rss>
diff --git a/public/tags/index.html b/public/tags/index.html
index 4d7bbd7..35ae522 100644
--- a/public/tags/index.html
+++ b/public/tags/index.html
@@ -11,19 +11,57 @@
 <meta property="og:url" content="https://www.jonahramponi.com/tags/" />
 <meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Tags"/>
 <meta name="twitter:description" content=""/>
-
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
 	
         <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
 	
 
 	
 	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
 
 	
 	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
 
 	
 	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
 	
 	
 </head>
@@ -34,6 +72,12 @@
 	</div>
 	<nav>
 		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
 		
 	</nav>
 </header>
@@ -50,6 +94,18 @@ <h1 class="page-title">All tags</h1>
 <div class="tag-cloud">
 	<ul class="tags">
 		
+		
+		
+		<li><a style="font-size: 1rem;" href="https://www.jonahramponi.com/tags/attention/">attention</a></li>
+		
+		
+		
+		<li><a style="font-size: 1rem;" href="https://www.jonahramponi.com/tags/inference/">inference</a></li>
+		
+		
+		
+		<li><a style="font-size: 1rem;" href="https://www.jonahramponi.com/tags/attention-matrix/">attention matrix</a></li>
+		
 	</ul>
 </div>
 <footer>
diff --git a/public/tags/index.xml b/public/tags/index.xml
index 8fd1413..ee97c16 100644
--- a/public/tags/index.xml
+++ b/public/tags/index.xml
@@ -6,6 +6,28 @@
     <description>Recent content in Tags on Jonah&#39;s ML Notes</description>
     <generator>Hugo -- gohugo.io</generator>
     <language>en-us</language>
+    <lastBuildDate>Sat, 30 Mar 2024 00:00:00 +0000</lastBuildDate>
     <atom:link href="https://www.jonahramponi.com/tags/index.xml" rel="self" type="application/rss+xml" />
+    <item>
+      <title>Attention</title>
+      <link>https://www.jonahramponi.com/tags/attention/</link>
+      <pubDate>Sat, 30 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/tags/attention/</guid>
+      <description></description>
+    </item>
+    <item>
+      <title>Inference</title>
+      <link>https://www.jonahramponi.com/tags/inference/</link>
+      <pubDate>Tue, 26 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/tags/inference/</guid>
+      <description></description>
+    </item>
+    <item>
+      <title>Attention Matrix</title>
+      <link>https://www.jonahramponi.com/tags/attention-matrix/</link>
+      <pubDate>Fri, 22 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/tags/attention-matrix/</guid>
+      <description></description>
+    </item>
   </channel>
 </rss>
diff --git a/public/tags/inference/index.html b/public/tags/inference/index.html
new file mode 100644
index 0000000..ec3dd90
--- /dev/null
+++ b/public/tags/inference/index.html
@@ -0,0 +1,105 @@
+<!DOCTYPE html>
+<html><head lang="en">
+	<meta charset="utf-8" />
+	<meta http-equiv="X-UA-Compatible" content="IE=edge"><title>Inference - Jonah&#39;s ML Notes</title><meta name="viewport" content="width=device-width, initial-scale=1">
+	<meta name="description" content="" />
+	<meta property="og:image" content=""/>
+	<link rel="alternate" type="application/rss+xml" href="https://www.jonahramponi.com/tags/inference/index.xml" title="Jonah's ML Notes" />
+	<meta property="og:title" content="Inference" />
+<meta property="og:description" content="" />
+<meta property="og:type" content="website" />
+<meta property="og:url" content="https://www.jonahramponi.com/tags/inference/" />
+<meta name="twitter:card" content="summary"/><meta name="twitter:title" content="Inference"/>
+<meta name="twitter:description" content=""/>
+<script src="https://www.jonahramponi.com/js/feather.min.js"></script>
+	
+	
+        <link href="https://www.jonahramponi.com/css/fonts.11a1877508139eac0b5b4852ceb110c35641b3533321e66e39149e901ed5756b.css" rel="stylesheet">
+	
+
+	
+	<link rel="stylesheet" type="text/css" media="screen" href="https://www.jonahramponi.com/css/main.d902908ac6e0fab67957de5db5aea1b6455b19ae2ca98eac4c95a4a0fdc02238.css" />
+		<link id="darkModeStyle" rel="stylesheet" type="text/css" href="https://www.jonahramponi.com/css/dark.c95c5dcf5f32f8b67bd36f7dab66680e068fce2b303087294114aabf7a7c080b.css"  disabled />
+	
+
+	
+	
+		<script type="text/javascript"
+		src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
+		</script>
+	
+		
+		<script type="text/x-mathjax-config">
+		MathJax.Hub.Config({
+			tex2jax: {
+				inlineMath: [['$','$'], ['\\(','\\)']],
+				displayMath: [['$$','$$'], ['\[','\]']],
+				processEscapes: true,
+				processEnvironments: true,
+				skipTags: ['script', 'noscript', 'style', 'textarea', 'pre'],
+				TeX: { equationNumbers: { autoNumber: "AMS" },
+						 extensions: ["AMSmath.js", "AMSsymbols.js"] }
+			}
+		});
+		</script>
+	
+
+	
+	
+		<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.css">
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/katex.min.js"></script>
+		<script defer src="https://cdn.jsdelivr.net/npm/katex@0.15.2/dist/contrib/auto-render.min.js" onload="renderMathInElement(document.body);"></script>
+		
+		
+		<script>
+			document.addEventListener("DOMContentLoaded", function() {
+					renderMathInElement(document.body, {
+							delimiters: [
+									{left: "$$", right: "$$", display: true},
+									{left: "$", right: "$", display: false}
+							]
+					});
+			});
+			</script>
+	
+	
+	
+</head>
+<body>
+        <div class="content"><header>
+	<div class="main">
+		<a href="https://www.jonahramponi.com/">Jonah&#39;s ML Notes</a>
+	</div>
+	<nav>
+		
+		<a href="/">Home</a>
+		
+		<a href="/about">About</a>
+		
+		| <span id="dark-mode-toggle" onclick="toggleTheme()"></span>
+		<script src="https://www.jonahramponi.com/js/themetoggle.js"></script>
+		
+	</nav>
+</header>
+
+
+<h1>Entries tagged - "inference"</h1>
+
+
+<ul class="posts"><li class="post">
+			<a href="/posts/flash_attention/">Flash Attention</a> <span class="meta">Mar 26, 2024</span>
+		</li><li class="post">
+			<a href="/posts/kv_cache/">The KV Cache</a> <span class="meta">Mar 22, 2024</span>
+		</li></ul>
+<footer>
+  <div style="display:flex"></div>
+  <div class="footer-info">
+    2024  <a
+      href="https://github.com/athul/archie">Archie Theme</a> | Built with <a href="https://gohugo.io">Hugo</a>
+  </div>
+</footer>
+
+
+</div>
+    </body>
+</html>
diff --git a/public/tags/inference/index.xml b/public/tags/inference/index.xml
new file mode 100644
index 0000000..1c3f7c1
--- /dev/null
+++ b/public/tags/inference/index.xml
@@ -0,0 +1,26 @@
+<?xml version="1.0" encoding="utf-8" standalone="yes"?>
+<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
+  <channel>
+    <title>Inference on Jonah&#39;s ML Notes</title>
+    <link>https://www.jonahramponi.com/tags/inference/</link>
+    <description>Recent content in Inference on Jonah&#39;s ML Notes</description>
+    <generator>Hugo -- gohugo.io</generator>
+    <language>en-us</language>
+    <lastBuildDate>Tue, 26 Mar 2024 00:00:00 +0000</lastBuildDate>
+    <atom:link href="https://www.jonahramponi.com/tags/inference/index.xml" rel="self" type="application/rss+xml" />
+    <item>
+      <title>Flash Attention</title>
+      <link>https://www.jonahramponi.com/posts/flash_attention/</link>
+      <pubDate>Tue, 26 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/flash_attention/</guid>
+      <description>The goal of Flash Attention is to compute the attention value with fewer high bandwidth memory read / writes. The approach has since been refined in Flash Attention 2.&#xA;We will split the attention inputs $Q,K,V$ into blocks. Each block will be handled separately, and attention will therefore be computed with respect to each block. With the correct scaling, adding the outputs from each block we will give us the same attention value as we would get by computing everything all together.</description>
+    </item>
+    <item>
+      <title>The KV Cache</title>
+      <link>https://www.jonahramponi.com/posts/kv_cache/</link>
+      <pubDate>Fri, 22 Mar 2024 00:00:00 +0000</pubDate>
+      <guid>https://www.jonahramponi.com/posts/kv_cache/</guid>
+      <description>The computation of attention is costly. Remember that our decoder works in an auto-regressive fashion. For our given input $$\colorbox{red}{What}\colorbox{magenta}{ is}\colorbox{green}{ the}\colorbox{orange}{ capital}\colorbox{purple}{ of}\colorbox{brown}{ France}\colorbox{cyan}{?}&amp;quot;$$&#xA;\begin{align} \text{Prediction 1} &amp;amp;= \colorbox{orange}{The} \\ \text{Prediction 2} &amp;amp;= \colorbox{orange}{The}\colorbox{pink}{ capital} \\ &amp;amp;\vdots \\ \text{Prediction $p$} &amp;amp;= \colorbox{orange}{The}\colorbox{pink}{ capital} (\dots) \colorbox{red}{ Paris.} \end{align}&#xA;To produce prediction $2$, we will take the output from prediction $1$. At each step, the model will also see our input sequence.</description>
+    </item>
+  </channel>
+</rss>