Skip to content
This repository has been archived by the owner on Aug 21, 2024. It is now read-only.

improving render performance via PATH_CACHE usage and bulk git log ... call #85

Open
klandergren opened this issue May 18, 2021 · 0 comments

Comments

@klandergren
Copy link

I have a site with ~1300 documents and wanted to improve its render performance.

Two main observations:

  • PATH_CACHE is only read for formatted_last_modified_date, and not for last_modified_at_time
  • it looks like calls to git log ... scale linearly with the number of documents using last_modified_at within a liquid tag

re: PATH_CACHE usage
By aligning PATH_CACHE usage in both formatted_last_modified_date and last_modified_at_time the initial site render will be unaffected, but subsequent renders (e.g. after site reset when jekyll detects a change while running jekyll serve) will see improvement.

Pros:

  • very minimal patch footprint
  • regeneration time for ~1300 documents went from ~28s to ~4s

Caveats:

  • initial generation is unaffected
  • some users may have come to expect / depend on a "live" call to git log or mtime. Changing this would serve them cached time data
  • it was unclear to me if there was reason for the separation; I may have missed something!

An example of this implementation is at https://github.com/klandergren/jekyll-last-modified-at/tree/use-path-cache

re: git log ... calls scaling w/ number of documents
Both initial site render and subsequent renders will see improvement if we replace the 1:1 calls with a single git log call and cache its data. The call ends up fast enough that we can flush the cache during reset so users will always have a freshly determined last_modified_at (presumed to be preferable).

Pros:

  • feature-gated / off by default
  • initial generation and regeneration time for ~1300 documents went from ~28s to ~4s

Caveats:

  • larger patch footprint than PATH_CACHE usage
  • it is plausible that site with very large git log histories (e.g. lots of commits, lots of file churn, or both) would run into issues here. The repo I tested with has ~1800 commits without a lot of file churn. It would be possible to store the paths passed to Determinator.new and stop reading the git log when every file has been encountered, but it would be a messy implementation.
  • some users may have come to expect cached data for the formatted last_modified_at
  • some users may have a large number of uncommitted files (e.g. be using jekyll without a git repo) and the time saved by caching mtime data between rerenders is significant. I don't think this is probable but at least wanted to mention it.

An example of this implementation is at https://github.com/klandergren/jekyll-last-modified-at/tree/cache-git-information

I will open each of these improvement approaches as separate pull requests so you can evaluate.

Thanks for creating this plugin!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant