making_my_static_blog_generator_11_times_faster.html

<!DOCTYPE html>
<html>
<head>
<title>Making my static blog generator &lt;del&gt;11&lt;/del&gt; 33 times faster</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link type="application/atom+xml" href="/blog/feed.xml" rel="self">
<link rel="shortcut icon" type="image/ico" href="/blog/favicon.ico">
<link rel="stylesheet" type="text/css" href="main.css">
<link rel="stylesheet" href="https://unpkg.com/@highlightjs/cdn-assets@11.8.0/styles/default.min.css">
<script src="highlight.min.js"></script>
<!-- From https://github.com/odin-lang/odin-lang.org/blob/6f48c2cfb094a42dffd34143884fa958bd9c0ba2/themes/odin/layouts/partials/head.html#L71 -->
<script src="x86asm.min.js"></script>
<script src="odin_syntax.js"></script>
<script type="module" src="search_index.js"></script>
<script type="module" src="search.js"></script>
</head>
<body>

<div id="banner">
    <div id="name">
        <img id="me" src="me.jpeg">
        <span>Philippe Gaultier</span>
    </div>
    <input id="search" placeholder="🔎 Search" autocomplete=off>
    <ul>
      <li> <a href="/blog/body_of_work.html">Body of work</a> </li>
      <li> <a href="/blog/articles-by-tag.html">Tags</a> </li>
      <li> <a href="https://github.com/gaultier/resume/raw/master/Philippe_Gaultier_resume_en.pdf">
          Resume
        </a> </li>

      <li> <a href="/blog/feed.xml">
        <svg viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
        <path fill-rule="evenodd" clip-rule="evenodd" d="M5.5 3.5C4.39543 3.5 3.5 4.39543 3.5 5.5V18.5C3.5 19.6046 4.39543 20.5 5.5 20.5H18.5C19.6046 20.5 20.5 19.6046 20.5 18.5V5.5C20.5 4.39543 19.6046 3.5 18.5 3.5H5.5ZM7 19C8.10457 19 9 18.1046 9 17C9 15.8954 8.10457 15 7 15C5.89543 15 5 15.8954 5 17C5 18.1046 5.89543 19 7 19ZM6.14863 10.5052C6.14863 10.0379 6.52746 9.65906 6.99478 9.65906C7.95949 9.65906 8.91476 9.84908 9.80603 10.2183C10.6973 10.5874 11.5071 11.1285 12.1893 11.8107C12.8715 12.4929 13.4126 13.3027 13.7817 14.194C14.1509 15.0852 14.3409 16.0405 14.3409 17.0052C14.3409 17.4725 13.9621 17.8514 13.4948 17.8514C13.0275 17.8514 12.6486 17.4725 12.6486 17.0052C12.6486 16.2627 12.5024 15.5275 12.2183 14.8416C11.9341 14.1556 11.5177 13.5324 10.9927 13.0073C10.4676 12.4823 9.84437 12.0659 9.15842 11.7817C8.47246 11.4976 7.73726 11.3514 6.99478 11.3514C6.52746 11.3514 6.14863 10.9725 6.14863 10.5052ZM7 5.15385C6.53268 5.15385 6.15385 5.53268 6.15385 6C6.15385 6.46732 6.53268 6.84615 7 6.84615C8.33342 6.84615 9.65379 7.10879 10.8857 7.61907C12.1176 8.12935 13.237 8.87728 14.1799 9.82015C15.1227 10.763 15.8707 11.8824 16.3809 13.1143C16.8912 14.3462 17.1538 15.6666 17.1538 17C17.1538 17.4673 17.5327 17.8462 18 17.8462C18.4673 17.8462 18.8462 17.4673 18.8462 17C18.8462 15.4443 18.5397 13.9039 17.9444 12.4667C17.3491 11.0294 16.4765 9.72352 15.3765 8.6235C14.2765 7.52349 12.9706 6.65091 11.5333 6.05558C10.0961 5.46026 8.55566 5.15385 7 5.15385Z" fill="#000000"/>
        </svg>
        </a> </li>

      <li> <a href="https://www.linkedin.com/in/philippegaultier/">
        <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" data-supported-dps="24x24" fill="currentColor" class="mercado-match" width="24" height="24" focusable="false">
              <path d="M20.5 2h-17A1.5 1.5 0 002 3.5v17A1.5 1.5 0 003.5 22h17a1.5 1.5 0 001.5-1.5v-17A1.5 1.5 0 0020.5 2zM8 19H5v-9h3zM6.5 8.25A1.75 1.75 0 118.3 6.5a1.78 1.78 0 01-1.8 1.75zM19 19h-3v-4.74c0-1.42-.6-1.93-1.38-1.93A1.74 1.74 0 0013 14.19a.66.66 0 000 .14V19h-3v-9h2.9v1.3a3.11 3.11 0 012.7-1.4c1.55 0 3.36.86 3.36 3.66z"/>
            </svg>
        </a> </li>
      <li> <a href="https://github.com/gaultier">
        <svg height="32" aria-hidden="true" viewBox="0 0 24 24" version="1.1" width="32" data-view-component="true" class="octicon octicon-mark-github v-align-middle">
          <path d="M12.5.75C6.146.75 1 5.896 1 12.25c0 5.089 3.292 9.387 7.863 10.91.575.101.79-.244.79-.546 0-.273-.014-1.178-.014-2.142-2.889.532-3.636-.704-3.866-1.35-.13-.331-.69-1.352-1.18-1.625-.402-.216-.977-.748-.014-.762.906-.014 1.553.834 1.769 1.179 1.035 1.74 2.688 1.25 3.349.948.1-.747.402-1.25.733-1.538-2.559-.287-5.232-1.279-5.232-5.678 0-1.25.445-2.285 1.178-3.09-.115-.288-.517-1.467.115-3.048 0 0 .963-.302 3.163 1.179.92-.259 1.897-.388 2.875-.388.977 0 1.955.13 2.875.388 2.2-1.495 3.162-1.179 3.162-1.179.633 1.581.23 2.76.115 3.048.733.805 1.179 1.825 1.179 3.09 0 4.413-2.688 5.39-5.247 5.678.417.36.776 1.05.776 2.128 0 1.538-.014 2.774-.014 3.162 0 .302.216.662.79.547C20.709 21.637 24 17.324 24 12.25 24 5.896 18.854.75 12.5.75Z"/>
        </svg>
        </a> </li>
      <li> <a href="https://hachyderm.io/@pg">
        <svg width="75" height="79" viewBox="0 0 75 79" fill="none" xmlns="http://www.w3.org/2000/svg">
        <path d="M73.8393 17.4898C72.6973 9.00165 65.2994 2.31235 56.5296 1.01614C55.05 0.797115 49.4441 0 36.4582 0H36.3612C23.3717 0 20.585 0.797115 19.1054 1.01614C10.5798 2.27644 2.79399 8.28712 0.904997 16.8758C-0.00358524 21.1056 -0.100549 25.7949 0.0682394 30.0965C0.308852 36.2651 0.355538 42.423 0.91577 48.5665C1.30307 52.6474 1.97872 56.6957 2.93763 60.6812C4.73325 68.042 12.0019 74.1676 19.1233 76.6666C26.7478 79.2728 34.9474 79.7055 42.8039 77.9162C43.6682 77.7151 44.5217 77.4817 45.3645 77.216C47.275 76.6092 49.5123 75.9305 51.1571 74.7385C51.1797 74.7217 51.1982 74.7001 51.2112 74.6753C51.2243 74.6504 51.2316 74.6229 51.2325 74.5948V68.6416C51.2321 68.6154 51.2259 68.5896 51.2142 68.5661C51.2025 68.5426 51.1858 68.522 51.1651 68.5058C51.1444 68.4896 51.1204 68.4783 51.0948 68.4726C51.0692 68.4669 51.0426 68.467 51.0171 68.4729C45.9835 69.675 40.8254 70.2777 35.6502 70.2682C26.7439 70.2682 24.3486 66.042 23.6626 64.2826C23.1113 62.762 22.7612 61.1759 22.6212 59.5646C22.6197 59.5375 22.6247 59.5105 22.6357 59.4857C22.6466 59.4609 22.6633 59.4391 22.6843 59.422C22.7053 59.4048 22.73 59.3929 22.7565 59.3871C22.783 59.3813 22.8104 59.3818 22.8367 59.3886C27.7864 60.5826 32.8604 61.1853 37.9522 61.1839C39.1768 61.1839 40.3978 61.1839 41.6224 61.1516C46.7435 61.008 52.1411 60.7459 57.1796 59.7621C57.3053 59.7369 57.431 59.7154 57.5387 59.6831C65.4861 58.157 73.0493 53.3672 73.8178 41.2381C73.8465 40.7606 73.9184 36.2364 73.9184 35.7409C73.9219 34.0569 74.4606 23.7949 73.8393 17.4898Z" fill="url(#paint0_linear_549_34)"/>
        <path d="M61.2484 27.0263V48.114H52.8916V27.6475C52.8916 23.3388 51.096 21.1413 47.4437 21.1413C43.4287 21.1413 41.4177 23.7409 41.4177 28.8755V40.0782H33.1111V28.8755C33.1111 23.7409 31.0965 21.1413 27.0815 21.1413C23.4507 21.1413 21.6371 23.3388 21.6371 27.6475V48.114H13.2839V27.0263C13.2839 22.7176 14.384 19.2946 16.5843 16.7572C18.8539 14.2258 21.8311 12.926 25.5264 12.926C29.8036 12.926 33.0357 14.5705 35.1905 17.8559L37.2698 21.346L39.3527 17.8559C41.5074 14.5705 44.7395 12.926 49.0095 12.926C52.7013 12.926 55.6784 14.2258 57.9553 16.7572C60.1531 19.2922 61.2508 22.7152 61.2484 27.0263Z" fill="white"/>
        <defs>
        <linearGradient id="paint0_linear_549_34" x1="37.0692" y1="0" x2="37.0692" y2="79" gradientUnits="userSpaceOnUse">
        <stop stop-color="#6364FF"/>
        <stop offset="1" stop-color="#563ACC"/>
        </linearGradient>
        </defs>
        </svg>
        </a> </li>
      <li> <a href="https://bsky.app/profile/pgaultier.bsky.social">
        <svg fill="none" viewBox="0 0 64 57" width="32" style="width: 32px; height: 28.5px;"><path fill="#0085ff" d="M13.873 3.805C21.21 9.332 29.103 20.537 32 26.55v15.882c0-.338-.13.044-.41.867-1.512 4.456-7.418 21.847-20.923 7.944-7.111-7.32-3.819-14.64 9.125-16.85-7.405 1.264-15.73-.825-18.014-9.015C1.12 23.022 0 8.51 0 6.55 0-3.268 8.579-.182 13.873 3.805ZM50.127 3.805C42.79 9.332 34.897 20.537 32 26.55v15.882c0-.338.13.044.41.867 1.512 4.456 7.418 21.847 20.923 7.944 7.111-7.32 3.819-14.64-9.125-16.85 7.405 1.264 15.73-.825 18.014-9.015C62.88 23.022 64 8.51 64 6.55c0-9.818-8.578-6.732-13.873-2.745Z"/></svg>
        </a> </li>
    </ul>
</div>
<div id="search-matches" hidden>
</div>
<div id="pseudo-body">

<div class="article-prelude">
  <p><a href="/blog"> ⏴ Back to all articles</a></p>

  <p class="publication-date">Published on 2025-02-19</p>
</div>
<div class="article-title">
<h1>Making my static blog generator <del>11</del> 33 times faster</h1>
  <div class="tags"> <a href="/blog/articles-by-tag.html#optimization" class="tag">Optimization</a> <a href="/blog/articles-by-tag.html#git" class="tag">Git</a> <a href="/blog/articles-by-tag.html#odin" class="tag">Odin</a> <a href="/blog/articles-by-tag.html#fossil" class="tag">Fossil</a></div>
  </div>
 <strong>Table of contents</strong>
<ul>

  <li>
    <a href="#1071142523-the-investigation">The investigation</a>
  </li>

  <li>
    <a href="#3047915491-when-there-s-one-there-s-many">When there's one, there's many</a>
  </li>

  <li>
    <a href="#2709289763-the-new-approach">The new approach</a>
  </li>

  <li>
    <a href="#3854930553-the-new-implementation">The new implementation</a>
  </li>

  <li>
    <a href="#3898000878-fossil">Fossil</a>
  </li>

  <li>
    <a href="#3796851539-conclusion">Conclusion</a>
  </li>

  <li>
    <a href="#4016320828-addendum">Addendum</a>
  </li>

  <li>
    <a href="#2077345490-addendum-2">Addendum 2</a>
  </li>
</ul>

<p>This blog is statically generated from Markdown files. It used to be fast, but nowadays it's not:</p>
<pre><code class="language-sh"> $ hyperfine --warmup 2 ./master.bin 
Benchmark 1: ./master.bin
  Time (mean ± σ):      1.873 s ±  0.053 s    [User: 1.351 s, System: 0.486 s]
  Range (min … max):    1.806 s …  1.983 s    10 runs
</code></pre>
<p>~ 2 seconds is not the end of the world, but it's just enough to be annoying when doing lots of edit-view cycles. Worse, it seemed to become slower and slower as I wrote more articles. So today I finally dedicated some time to tackle this problem.</p>
<h2 id="1071142523-the-investigation">
  <a class="title" href="#1071142523-the-investigation">The investigation</a>
  <a class="hash-anchor" href="#1071142523-the-investigation" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>In the early days of this blog, there were only a few articles, and the build process was a simple Makefile, something like this (simplified):</p>
<pre><code class="language-make">%.html: %.md header.html footer.html
        cat header.html &gt;&gt; $@
        pandoc --toc $&lt; &gt;&gt; $@
        cat footer.html &gt;&gt; $@
</code></pre>
<p>For each markdown file, say <code>wayland_from_scratch.md</code>, we transform the markdown into HTML (at the time with <code>pandoc</code>, which proved to be super slow, now with <code>cmark</code> which is extremely fast) and save that in the file <code>wayland_from_scratch.html</code>, with a HTML header prepended and footer appended.</p>
<p>Later on, I added the publication date:</p>
<pre><code class="language-make">%.html: %.md header.html footer.html
        cat header.html &gt;&gt; $@
        printf '&lt;p id=&quot;publication_date&quot;&gt;Published on %s.&lt;/p&gt;\n' $$(git log --format='%as' --reverse -- $&lt; | head -n1)  &gt;&gt; $@; fi
        pandoc --toc $&lt; &gt;&gt; $@
        cat footer.html &gt;&gt; $@
</code></pre>
<p>The publication date is the creation date, that is: the date of the first Git commit for this file. So we ask Git for the list of commits for this file (they are provided by default from newest to oldest, so we <code>--reverse</code> the list), take the first one with <code>head</code>, done. It's simple.</p>
<p><em>Note: My initial approach was to get the creation and modification date from the file system, but it's incorrect, as soon as you work on more than one machine. The way Git works is that when you pull commits that created a file, it creates the file on the file system and does not try to hack the creation date. Thus the file creation date is the time of the Git pull, not the date of the commit that first created it.</em></p>
<p>As I added more and more features to this blog, like a list of article by tags, a home page that automatically lists all of the articles, a RSS feed, the 'last modified' date for an article, etc, I outgrew the Makefile approach and wrote a small <a href="https://github.com/gaultier/blog/blob/master/src/main.odin">program</a> (initially in Zig, then in <a href="https://odin-lang.org/">Odin</a>) to do all that. But the core approach remained:</p>
<ul>
<li>List all markdown files in the current directory (e.g. <code>ls *.md</code>, the Makefile did that for us with <code>%.md</code>)</li>
<li>For each markdown file, sequentially:
<ul>
<li>Run <code>git log article.md</code> to get the date of the first and last commits for this file (respectively 'created at' and 'modified at')</li>
<li>Convert the markdown content to HTML</li>
<li>Save this HTML to <code>article.html</code></li>
</ul>
</li>
</ul>
<p>For long time, it was all good. It was single-threaded, but plenty fast. So I wrote more and more articles. But now it's too slow. Why? Let's profile it:</p>
<p><object alt="Profile before the optimization" data="making_my_static_blog_generator_11_times_faster_profile_before.svg" type="image/svg+xml"></object></p>
<p>Yeah...I think it might be [git] [git] [git] [git] [git] [git] [git] [git]...</p>
<p>Another way to confirm this is with <code>strace</code>:</p>
<pre><code class="language-sh">$ strace --summary-only ./src.bin
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.85    0.479290        3928       122           waitid
[...]
</code></pre>
<p>So ~ <strong>95 %</strong> of the running time is spent waiting on a subprocess. It's mainly git - we also run <code>cmark</code> as a subprocess but it's really really fast. We could further investigate with <code>strace</code> which process we are waiting on but the CPU profile already points the finger at Git and <code>cmark</code> is not even visible on it.</p>
<p>At this point it's important to mention that this program is a very simplistic static site generator: it is stateless and processes every markdown file in the repository one by one. You could say that it's a regression compared to the Makefile because <code>make</code> has built-in parallelism with <code>-j</code> and change detection. But in reality, change detection in make is primitive and I often want to reprocess everything because of a change that applies to every file. For example, I reword the <code>Donate</code> section at the end of each article (wink wink), or the header, or the footer, etc.</p>
<p>Also, I really am fond of this 'pure function' approach. There is no caching issue to debug, no complicated code to write, no data races, no async callbacks, etc.</p>
<p>My performance target was to process every file within 1s, or possibly even 0.5s.</p>
<p>I could see a few options:</p>
<ul>
<li>Do not block on <code>git log</code>. We can use a thread pool, or an <a href="/blog/way_too_many_ways_to_wait_for_a_child_process_with_a_timeout.html">asynchronous approach</a> to spawn all the git processes at once, and wait for all of them to finish. But it's more complex.</li>
<li>Implement caching so that only the changed markdown files get regenerated.</li>
<li>Make <code>git log</code> faster somehow.</li>
</ul>
<p>The last option was my preferred one because it did not force me to re-architect the whole program.</p>
<p>Note that the other two options are sill on the table regardless of whether our experiment works out or not.</p>
<h2 id="3047915491-when-there-s-one-there-s-many">
  <a class="title" href="#3047915491-when-there-s-one-there-s-many">When there's one, there's many</a>
  <a class="hash-anchor" href="#3047915491-when-there-s-one-there-s-many" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>My intuition was to do a deep dive in the <code>git log</code> options, to see if I could instruct it to do less work. But then something struck me: we invoke <code>git log</code> to get all commits for one markdown file (even though we only are interested in the first and last, but that's the only interface Git provides us as far as I know). What if we invoked it once <em>for all markdown files</em>? Yes, the output might be a bit big... How big? Is it really faster? Let's measure!</p>
<p>Conceptually we can simply do <code>git log '*.md'</code> and parse the output. We can refine that approach later with more options, but that's the gist of it:</p>
<pre><code class="language-sh"> $ time git log '*.md' | wc -c
191196

________________________________________________________
Executed in   73.69 millis    fish           external
   usr time   61.04 millis  738.00 micros   60.30 millis
   sys time   15.95 millis  191.00 micros   15.76 millis
</code></pre>
<p>So it's much faster than doing it per file, and also it's entire output is ~ 186 KiB. And these numbers should grow very slowly because each new commit only adds 20-100 bytes to the output.</p>
<p>Looks like we got our solution. There is one added benefit: we do not need to list all <code>.md</code> files in the directory at startup. Git gives us this information (in my case there are no markdown files <em>not</em> tracked by Git).</p>
<p>Mike Acton and <a href="https://en.wikipedia.org/wiki/Data-oriented_design">Data Oriented Design</a> are right once again:</p>
<blockquote>
<p>Rule of thumb: When there is one, there is many. <sup class="footnote-ref"><a href="#fn-1" id="fnref-1" data-footnote-ref>1</a></sup></p>
</blockquote>
<p>Or: try to think in terms of arrays, not in terms of one isolated object at a time.</p>
<h2 id="2709289763-the-new-approach">
  <a class="title" href="#2709289763-the-new-approach">The new approach</a>
  <a class="hash-anchor" href="#2709289763-the-new-approach" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>We only want git to tell us, for each commit:</p>
<ul>
<li>The date</li>
<li>Which files were affected</li>
</ul>
<p>Hence we pass to <code>git log</code>:</p>
<ul>
<li><code>--format='%aI'</code> to get the date in ISO format</li>
<li><code>--name-status</code> to know which files this commit added (<code>A</code>), modified (<code>M</code>), deleted (<code>D</code>), or renamed (<code>R</code>)</li>
<li><code>--no-merges</code> to skip merge commits since they do not directly affect any file</li>
<li><code>--diff-filter=AMRD</code> to only get commits that add/modify/delete/rename files. We are not interested in commits changing the permissions on a file, or modifying symbolic links, etc.</li>
</ul>
<p>With these options we get even better numbers:</p>
<pre><code class="language-sh">$ time git log --format='%aI' --name-status --no-merges --diff-filter=AMDR  -- '*.md' | wc -c
77832

________________________________________________________
Executed in  108.38 millis    fish           external
   usr time   83.70 millis  231.00 micros   83.47 millis
   sys time   27.99 millis  786.00 micros   27.20 millis
</code></pre>
<p>The output looks like this (I annotated each part along with the commit number):</p>
<pre><code>2024-11-05T15:43:44+01:00                                                            | [1] A commit starts with the date.
                                                                                     | [1] Empty line
M       how_to_rewrite_a_cpp_codebase_successfully.md                                | [1] A list of files affected by this commit.
M       write_a_video_game_from_scratch_like_1987.md                                 | [1] Each starts with a letter describing the action.
M       x11_x64.md                                                                   | [1] Here it's all modifications.
M       you_inherited_a_legacy_cpp_codebase_now_what.md                              | [1]
2025-02-02T22:54:23+01:00                                                            | [2] The second entry starts.
                                                                                     | [2] 
R100    cross-platform-timers.md        the_missing_cross_platform_api_for_timers.md | [2] Rename with the (unneeded) confidence score.
[...]                                                                                | Etc.
</code></pre>
<p>Parsing this commit log is tedious but not extremely difficult.</p>
<p>We maintain a map while inspecting each commit: <code>map&lt;Path, (creation_date, modification_date, tombstone)&gt;</code>.</p>
<p>In case of a rename or delete, we set the <code>tombstone</code> to <code>true</code>. Why not remove the entry from the map directly? Well, we are inspecting the list of commits from newest to oldest.
So first we'll encounter the delete/rename commit for this file, and then later in the stream, a number of add/modify commits. When we are done, we need to remember that this markdown file should be ignored, otherwise, we'll try to open it, read it, and convert it to HTML, but we'll get a <code>ENOENT</code> error because it does not exist anymore on disk. We could avoid having this tombstone field and just bail on <code>ENOENT</code>, that's a matter of taste I guess, but this field was useful to me to ensure that the parsing code is correct.</p>
<p>Alternatively, we could pass <code>--reverse</code> to <code>git log</code> and parse the commits in chronological order. When we see a delete/rename commit for a file, we can safely remove the entry from the map since no more commits about this file should show up after that.</p>
<h2 id="3854930553-the-new-implementation">
  <a class="title" href="#3854930553-the-new-implementation">The new implementation</a>
  <a class="hash-anchor" href="#3854930553-the-new-implementation" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<pre><code class="language-odin">GitStat :: struct {
	creation_date:     string,
	modification_date: string,
	path_rel:          string,
}

get_articles_creation_and_modification_date :: proc() -&gt; ([]GitStat, os2.Error) {
	free_all(context.temp_allocator)
	defer free_all(context.temp_allocator)

	state, stdout_bin, stderr_bin, err := os2.process_exec(
		{
			command = []string {
				&quot;git&quot;,
				&quot;log&quot;,
				// Print the date in ISO format.
				&quot;--format='%aI'&quot;,
				// Ignore merge commits since they do not carry useful information.
				&quot;--no-merges&quot;,
				// Only interested in creation, modification, renaming, deletion.
				&quot;--diff-filter=AMRD&quot;,
				// Show which modification took place:
				// A: added, M: modified, RXXX: renamed (with percentage score), etc.
				&quot;--name-status&quot;,
				&quot;*.md&quot;,
			},
		},
		context.temp_allocator,
	)
	if err != nil {
		fmt.eprintf(&quot;git failed: %d %v %s&quot;, state, err, string(stderr_bin))
		panic(&quot;git failed&quot;)
	}

	stdout := strings.trim_space(string(stdout_bin))
	assert(stdout != &quot;&quot;)

	GitStatInternal :: struct {
		creation_date:     string,
		modification_date: string,
		tombstone:         bool,
	}
	stats_by_path := make(map[string]GitStatInternal, allocator = context.temp_allocator)

	// Sample git output:
	// 2024-10-31T16:09:02+01:00
	// 
	// M       lessons_learned_from_a_successful_rust_rewrite.md
	// A       tip_of_day_3.md
	// 2025-02-18T08:07:55+01:00
	//
	// R100    sha.md  making_my_debug_build_run_100_times_faster.md

	// For each commit.
	for {
		// Date
		date: string
		{
			line := strings.split_lines_iterator(&amp;stdout) or_break

			assert(strings.starts_with(line, &quot;'20&quot;))
			line_without_quotes := line[1:len(line) - 1]
			date = strings.clone(strings.trim(line_without_quotes, &quot;' \n&quot;))
		}

		// Empty line
		{
			// Peek.
			line, ok := strings.split_lines_iterator(&amp;stdout)
			assert(ok)
			assert(line == &quot;&quot;)
		}

		// Files.
		for {
			// Start of a new commit?
			if strings.starts_with(stdout, &quot;'20&quot;) do break

			line := strings.split_lines_iterator(&amp;stdout) or_break
			assert(line != &quot;&quot;)

			action := line[0]
			assert(action == 'A' || action == 'M' || action == 'R' || action == 'D')

			old_path: string
			new_path: string
			{
				// Skip the 'action' part.
				_, ok := strings.split_iterator(&amp;line, &quot;\t&quot;)
				assert(ok)

				old_path, ok = strings.split_iterator(&amp;line, &quot;\t&quot;)
				assert(ok)
				assert(old_path != &quot;&quot;)

				if action == 'R' { 	// Rename has two operands.
					new_path, ok = strings.split_iterator(&amp;line, &quot;\t&quot;)
					assert(ok)
					assert(new_path != &quot;&quot;)
				} else { 	// The others have only one.
					new_path = old_path
				}
			}

			_, v, inserted, err := map_entry(&amp;stats_by_path, new_path)
			assert(err == nil)

			if inserted {
				v.modification_date = date
				v.creation_date = date
			} else {
				assert(v.modification_date != &quot;&quot;)
				// Keep updating the creation date, when we reach the end of the commit log, it has the right value.
				v.creation_date = date
			}


			// We handle the action separately from the fact that this is the first commit we see for the path.
			// Because a file could have only one commit which is a rename.
			// Or its first commit is a rename but then there additional commits to modify it. 
			// Case being: these two things are orthogonal.

			if action == 'R' {
				// Mark the old path as 'deleted'.
				stats_by_path[old_path] = GitStatInternal {
					modification_date = date,
					tombstone         = true,
				}

				// The creation date of the new path is the date of the rename operation.
				v.creation_date = date
			}
			if action == 'D' {
				// Mark as 'deleted'.
				v.tombstone = true
			}
		}
	}

	git_stats := make([dynamic]GitStat)
	for k, v in stats_by_path {
		assert(k != &quot;&quot;)
		assert(v.creation_date != &quot;&quot;)
		assert(v.modification_date != &quot;&quot;)
		assert(v.creation_date &lt;= v.modification_date)

		if !v.tombstone {
			git_stat := GitStat {
				path_rel          = strings.clone(k),
				creation_date     = strings.clone(v.creation_date),
				modification_date = strings.clone(v.modification_date),
			}
			fmt.printf(&quot;%v\n&quot;, git_stat)
			append(&amp;git_stats, git_stat)
		}
	}

	return git_stats[:], nil
}
</code></pre>
<p>A few things of interest:</p>
<ul>
<li>Odin has first class support for allocators so we allocate everything in this function with the temporary allocator. It is backed by an arena and emptied at the start and end of the function. Only the final result is allocated with the standard allocator. That way, even if Git starts spewing lots of data, as soon as we exit the function, all of that is gone, in one call, and the the program carries on with only the necessary data heap-allocated.</li>
<li>In this program, the main allocator and the temporary allocator are both arenas. The memory usage is a constant ~ 4 MiB, mainly located in the Odin standard library. The memory usage of my code is around ~ 65 KiB.</li>
<li>A <code>map</code> is a bit of an overkill for ~30 entries, but it's fine, and we expect the number of articles to grow</li>
</ul>
<hr />
<p>We can log the final result:</p>
<pre><code>[...]
GitStat{creation_date = &quot;2020-09-07T20:49:20+02:00&quot;, modification_date = &quot;2024-11-04T09:24:17+01:00&quot;, path_rel = &quot;compile_ziglang_from_source_on_alpine_2020_9.md&quot;}
GitStat{creation_date = &quot;2024-09-10T12:59:04+02:00&quot;, modification_date = &quot;2024-09-12T12:14:42+02:00&quot;, path_rel = &quot;odin_and_musl.md&quot;}
GitStat{creation_date = &quot;2023-11-23T11:26:11+01:00&quot;, modification_date = &quot;2025-02-06T20:55:27+01:00&quot;, path_rel = &quot;roll_your_own_memory_profiling.md&quot;}
[...]
</code></pre>
<p>Alright, so how does our new implementation fare compared to the old one?</p>
<p>First, we can confirm with <code>strace</code> that the time spent on waiting for subprocesses (mainly Git) shrinked:</p>
<pre><code class="language-sh">$ strace --summary-only ./src.bin
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 56.59    0.043176         674        64           waitid
 [...]
</code></pre>
<p>Then we benchmark:</p>
<pre><code class="language-sh"> $ hyperfine --warmup 2 './src-main.bin' './src.bin'
Benchmark 1: ./src-main.bin
  Time (mean ± σ):      1.773 s ±  0.022 s    [User: 1.267 s, System: 0.472 s]
  Range (min … max):    1.748 s …  1.816 s    10 runs
 
Benchmark 2: ./src.bin
  Time (mean ± σ):     158.7 ms ±   6.6 ms    [User: 128.4 ms, System: 133.7 ms]
  Range (min … max):   151.7 ms … 175.6 ms    18 runs
 
Summary
  ./src.bin ran
   11.17 ± 0.48 times faster than ./src-main.bin
</code></pre>
<p>Around 11 times faster, and well within our ideal target of 500 ms ! And all we had to do was convert many <code>git log</code> invocations (one per markdown file) to just one.</p>
<p>Here's the CPU profile now:</p>
<p><object alt="CPU profile after the optimization" data="making_my_static_blog_generator_11_times_faster_profile_after.svg" type="image/svg+xml"></object></p>
<hr />
<p>Overall it's a pretty simple change, located in one function. Almost all of the complexity is due to parsing Git custom text output and skipping over irrelevant commits. We don't really have a choice either: that's all Git provides to query the commit log. The alternatives are all worse:</p>
<ul>
<li>Parse directly the Git object files - no thank you</li>
<li>Use a library (e.g. <code>libgit2</code>) and hope that it offers a saner interface to query the commit log</li>
</ul>
<p>I wonder if there is a better way...</p>
<h2 id="3898000878-fossil">
  <a class="title" href="#3898000878-fossil">Fossil</a>
  <a class="hash-anchor" href="#3898000878-fossil" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p><a href="https://fossil-scm.org">fossil</a> is an alternative version control system created by the same folks that created, and are still working on, SQLite. Naturally, a fossil repository is basically just one SQLite database file. That sounds very <em>queryable</em>!</p>
<p>Let's import our git repository into a Fossil repository and enter the SQLite prompt:</p>
<pre><code class="language-sh">$ git fast-export --all | fossil import --git new-repo.fossil
$ file new-repo.fossil 
new-repo.fossil: SQLite 3.x database (Fossil repository), [...]
$ fossil sql -R new-repo.fossil
</code></pre>
<p>There are lots of tables in this database. We craft this query after a few trials and errors (don't know if it is optimal or not):</p>
<pre><code class="language-sql">sqlite&gt; .mode json
sqlite&gt; SELECT 
            f.name as filename,
            datetime(min(e.mtime)) as creation_date,
            datetime(max(e.mtime)) as last_modified
        FROM repository.filename f
        JOIN repository.mlink m ON f.fnid = m.fnid
        JOIN repository.event e ON m.mid = e.objid
        WHERE filename LIKE '%.md'
        GROUP BY f.name
        ORDER BY f.name;
</code></pre>
<p>Which outputs what we want:</p>
<pre><code>[...]
{&quot;filename&quot;:&quot;body_of_work.md&quot;,&quot;creation_date&quot;:&quot;2023-12-19 13:27:40&quot;,&quot;last_modified&quot;:&quot;2024-11-05 15:11:55&quot;},
{&quot;filename&quot;:&quot;communicate_by_sharing_code.md&quot;,&quot;creation_date&quot;:&quot;2024-03-07 09:48:39&quot;,&quot;last_modified&quot;:&quot;2024-03-07 10:14:09&quot;},
{&quot;filename&quot;:&quot;compile_ziglang_from_source_on_alpine_2020_9.md&quot;,&quot;creation_date&quot;:&quot;2020-09-07 18:49:20&quot;,&quot;last_modified&quot;:&quot;2024-11-04 08:24:17&quot;},
[...]
</code></pre>
<p>Note that this does not filter out deleted/removed files yet. I'm sure that it can be done by tweaking the query a bit, but there's not time! We need to benchmark!</p>
<pre><code class="language-sh">$ hyperfine --shell=none 'fossil sql -R new-repo.fossil &quot;SELECT [...]&quot;'
Benchmark 1: fossil sql -R new-repo.fossil &quot;[...]&quot;
  Time (mean ± σ):       3.0 ms ±   0.5 ms    [User: 1.5 ms, System: 1.4 ms]
  Range (min … max):     2.2 ms …   5.6 ms    776 runs
</code></pre>
<p>Damn that's fast.</p>
<p>I do not use Fossil, but I eye it from time to time - generally when I need to extract some piece of information from Git and I discover it does not let me, or when I see the Gitlab bill  my (ex-) company pays, or when the Jira page takes more than 10 seconds to load... yeah, Fossil is the complete package, with issues, forums, a web UI, a timeline, a wiki, a chat... it has it all!</p>
<p>But the golden ticket idea is really to store everything inside SQLite. Suddenly, we can query anything! And there is no weird parsing needed - SQLite supports various export formats and (some? all?) fossil commands support the <code>--sql</code> option to show you which SQL query they use to get the information. After all, the only thing the <code>fossil</code> command line does in this case, is craft a SQL query and run int on the SQLite database.</p>
<p>It's quite magical to me that I can within a few seconds import my 6 years-long git repository into a SQLite database and start querying it, and the performance is great.</p>
<p>Now I am not <em>quite</em> ready yet to move to Fossil, and the import is a one time thing as far as I know, so it is not a viable option for the problem at hand as long as git is the source of truth. But still, while I was trying to tackle <code>git log</code> into submission, I was thinking the whole time: why can't I do an arbitrary query of git data? Generally, the more generic approach is slower than the ad-hoc one, but here it's not even the case. Fossil is for this use case objectively more powerful, more generic, <em>and</em> faster.</p>
<h2 id="3796851539-conclusion">
  <a class="title" href="#3796851539-conclusion">Conclusion</a>
  <a class="hash-anchor" href="#3796851539-conclusion" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>The issue was effectively a N+1 query problem. We issued a separate 'query' (in this case, <code>git log</code>) for each markdown file, in a sequential blocking fashion. This approach worked until it didn't because the number of entities (i.e. articles, and commits) grew over time.</p>
<p>The solution is instead to do only one query for all entities. It may return a bit more data that we need, but that's much faster, and scales better, than the original version.</p>
<p>It's obvious in retrospect but it's easy to let it happen when the 'codebase' (a big word for only one file that started as a basic Makefile) is a few years old, it's only looked at briefly from time to time, and the initial implementation did not really allow for the correct approach - who wants to parse the <code>git log</code> output in the Makefile language?</p>
<p>Furthermore, the initial approach was fine because it only looked at the creation date, so we could do <code>git log --reverse article.md | head -n1</code> which is faster than sifting through the whole commit log for this file. However, as it is always the case, requirements (again, a big word for: my taste concerning what should my personal blog look like) change and the modification date now had to also be extracted from git. This forced us, with the current Git functionality, to scan through the whole commit log, for each file, which became too slow.</p>
<p>As Mike Acton and Data Oriented Design state:</p>
<blockquote>
<p>Different problems require different solutions.<sup class="footnote-ref"><a href="#fn-2" id="fnref-2" data-footnote-ref>2</a></sup></p>
</blockquote>
<p>And:</p>
<blockquote>
<p>If you have different data, you have a different problem. <sup class="footnote-ref"><a href="#fn-3" id="fnref-3" data-footnote-ref>3</a></sup></p>
</blockquote>
<p>It also does not help that any querying in Git is ad-hoc and outputs a weird text format that we have to tediously parse. Please everyone, let's add the option to output structured data in our command line programs, damn it! String programming is no fun at all - that's why I moved away from concatenating the output of shell commands in a Makefile, to a real programming language, to do the static generation.</p>
<p>All in all, I am pleased with my solution - I can now see any edit materialize <em>instantly</em>. It's a bit funny that my previous article was about SIMD and inspecting assembly instructions, while this issue is so obvious and high-level in retrospect.</p>
<p>To the next 5 years of blogging, till I need to revisit the performance of this function!</p>
<h2 id="4016320828-addendum">
  <a class="title" href="#4016320828-addendum">Addendum</a>
  <a class="hash-anchor" href="#4016320828-addendum" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>Running <code>git gc</code> and <code>git prune</code> also helps because all unreachable objects are removed, so git has to do less work scanning and parsing them on disk.</p>
<p>Having done that, we get almost twice as fast:</p>
<pre><code class="language-sh">$ hyperfine --shell=none --warmup 2 './src.bin'
Benchmark 1: ./src.bin
  Time (mean ± σ):      89.7 ms ±   2.6 ms    [User: 63.6 ms, System: 59.5 ms]
  Range (min … max):    85.0 ms …  94.3 ms    31 runs
</code></pre>
<p>But we have to remember to run it frequently or set up a periodic job that does it for us.</p>
<p>That's a ~21x speed-up from the original time.</p>
<h2 id="2077345490-addendum-2">
  <a class="title" href="#2077345490-addendum-2">Addendum 2</a>
  <a class="hash-anchor" href="#2077345490-addendum-2" aria-hidden="true" onclick="navigator.clipboard.writeText(this.href);"></a>
</h2>
<p>Since spawning the <code>cmark</code> process, having <code>cmark</code> parsing the command line options, over and over, is still taking a good chunk of the time, we can switch to using <code>libcmark</code> directly. This is something I wanted to do anyway to extract a table of content, etc from the markdown. Since the running time is getting lower and lower, we add <code>--shell=none</code> and increase the warm-up to reduce statistical outliers:</p>
<pre><code class="language-sh">$ hyperfine --shell=none --warmup 10 ./src.bin
Benchmark 1: ./src.bin
  Time (mean ± σ):      55.4 ms ±   1.5 ms    [User: 49.0 ms, System: 35.0 ms]
  Range (min … max):    53.1 ms …  61.0 ms    55 runs
</code></pre>
<p><em>In fine</em>: a x33 speed-up from the original time.</p>
<section class="footnotes" data-footnotes>
<ol>
<li id="fn-1">
<p><a href="https://www.youtube.com/watch?v=rX0ItVEVjHc&amp;t=14m33s">CppCon 2014: Mike Acton &quot;Data-Oriented Design and C++&quot;</a> <a href="#fnref-1" class="footnote-backref" data-footnote-backref data-footnote-backref-idx="1" aria-label="Back to reference 1">↩</a></p>
</li>
<li id="fn-2">
<p><a href="https://www.youtube.com/watch?v=rX0ItVEVjHc&amp;t=13m13s">CppCon 2014: Mike Acton &quot;Data-Oriented Design and C++&quot;</a> <a href="#fnref-2" class="footnote-backref" data-footnote-backref data-footnote-backref-idx="2" aria-label="Back to reference 2">↩</a></p>
</li>
<li id="fn-3">
<p><a href="https://www.youtube.com/watch?v=rX0ItVEVjHc&amp;t=13m21s">CppCon 2014: Mike Acton &quot;Data-Oriented Design and C++&quot;</a> <a href="#fnref-3" class="footnote-backref" data-footnote-backref data-footnote-backref-idx="3" aria-label="Back to reference 3">↩</a></p>
</li>
</ol>
</section>
<p><a href="/blog"> ⏴ Back to all articles</a></p>

<blockquote id="donate">
  <p>If you enjoy what you're reading, you want to support me, and can afford it: <a href="https://paypal.me/philigaultier?country.x=DE&locale.x=en_US">Support me</a>. That allows me to write more cool articles!</p>
</blockquote>

<blockquote>
  <p>
    This blog is <a href="https://github.com/gaultier/blog">open-source</a>!
    If you find a problem, please open a Github issue.
    The content of this blog as well as the code snippets are under the <a href="https://en.wikipedia.org/wiki/BSD_licenses#3-clause_license_(%22BSD_License_2.0%22,_%22Revised_BSD_License%22,_%22New_BSD_License%22,_or_%22Modified_BSD_License%22)">BSD-3 License</a> which I also usually use for all my personal projects. It's basically free for every use but you have to mention me as the original author.
  </p>
</blockquote>

</div>
</body>
</html>