Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing Build time from 13min to 3min #393

Open
finanalyst opened this issue May 29, 2024 · 8 comments
Open

Reducing Build time from 13min to 3min #393

finanalyst opened this issue May 29, 2024 · 8 comments
Labels
build Building the site

Comments

@finanalyst
Copy link
Collaborator

@dontlaugh The buildkite CI shows that a standard build of the website takes over 13 minutes. When I run the same Build on my desktop it takes 13minutes after a fresh install, and then about 3minutes thereafter.

The difference between a fresh install and a rebuild is that the directory Website/plugins/typegraph/typegraphs/ is empty on a fresh install and contains 3GB in 423 .svg files thereafter.

The .svg files are generated from a text file that has not changed in several years. So regenerating them each time is not adding information. However, if the svg files are built and included in the doc-website repo, each time the repo is cloned, all 3GB are transferred. So it makes sense not to include them in the repo.

When the typegraph plugin is run during the build process, it checks to see if ../typegraphs directory is empty, or the typegraphs.txt file has been modified after the age of the directory. If the directory is full and the text file has not been moddified, then the typegraphs are not regenerated.

Is it possible to zip the Website/plugins/typegraph/typegraphs directory after a build and store it as an artifact, then restore the directory before the Build process?

If so, the build process will be shortened by about 10 minutes.

@finanalyst finanalyst added the build Building the site label May 29, 2024
@dontlaugh
Copy link
Collaborator

I think 3GB will exceed what Buildkite's cloud-based artifact cache will let us use for upload and download.

Instead, we can use an out-of-tree on disk cache, one per build agent (we usually have 2 or 3). If we can configure this as an absolute path, say /opt/cache/typegraph, then it will be shared among the builds.

Can we provide an out-of-tree path here? If so, I'll set up the filesystem permissions.

@finanalyst
Copy link
Collaborator Author

can we have the out-of-tree path as an alias to Website/plugins/typegraph/typegraphs/ ?

Alternatively, i could add a Callable to the typegraph plugin that copies the contents of that directory to a directory specified in the plugin-options, which would be a path relative to Website, eg, ../opt/cache/typegraphs, just like the HtML files are moved to ../rendered_html

@dontlaugh
Copy link
Collaborator

can we have the out-of-tree path as an alias to Website/plugins/typegraph/typegraphs/ ?

Do you mean a symlink? If so, yes. The build script could symlink a local directory to the shared, persistent directory if it's not already symlinked.

@finanalyst
Copy link
Collaborator Author

@dontlaugh I did mean a symlink. Where would the instruction / command be put?

@dontlaugh
Copy link
Collaborator

I think the place to intervene will be at the beginning of the build job here

https://github.com/Raku/doc-website/blob/main/.buildkite/pipeline.yaml#L6

I've opened a draft PR to test this #394

@finanalyst
Copy link
Collaborator Author

@dontlaugh As I understand it now, the build environment contains a directory home/builder/cache/, then the Collection utility build_site is run, which in turn calls Website/plugins/typegraph/add-type-graph.raku.
add-type-graph.raku has the following code:

sub ($pp, %options) {
  unless (
          ('typegraphs'.IO ~~ :e & :d)
          and ( 'type-graph.txt'.IO.modified le 'typegraphs'.IO.modified )
          and ( +'typegraphs'.IO.dir > 1 )
          )
      {
          note 'Generating Typegraphs' unless %options<no-status>;
          mkdir 'typegraphs' unless 'typegraphs'.IO ~~ :e & :d;
...

Since a fresh install of the repo only has typegraphs/.gitkeep the condition after the unless is False ('typegraphs'.IO.dir == 1), the code in the block is executed.

So, I think we change the typegraph plugin as follows:

  • remove the directory typegraph/typegraphs/ from the repo altogether
  • create a config field cache-dir => 'typegraph'
  • include in Website/configs/03-plugin-options, the entry typegraph => %( :cache-dir</home/builder/cache>,),
  • rewrite the code above to
sub ($pp, %options) {
  my %config = $pp.get-data('typegraph'); # gets the config options 
  unless 'typegraph'.IO ~~ :e & :d {
    if %config<cache-dir>.IO ~~ :e & :d { #check that the cache path exists
      %config<cache-dir>.IO.symlink( 'typegraph' ) # create a symlink to the existing cache
   }
   else {
      'typegraph'.IO.mkdir # otherwise create a new local directory
   }
 }
  unless +'typegraph'.IO.dir > 1 { # if typegraph is ever updated, we will need to empty the cache manually
...

@dontlaugh
Copy link
Collaborator

@finanalyst If we check in this change, will that break local builds? It seems like it will, because we cannot expect any machine but the builder machine to have an absolute path /home/builder/cache.

How about this alternative:

  • no new config attribute
  • remove typegraphs/.gitkeep
  • add typegraphs to gitignore

Then, we rely on CI scripts to do the orchestration/setup of the environment. On every CI run:

  • If /home/builder/cache does not exist, create it
  • If relative path Website/plugins/typegraph/typegraphs exists and is not a symlink, remove it
  • Symlink Website/plugins/typegraph/typegraphs to /home/builder/cache

The CI agent - even though it runs on a random environment - is aware of the repo-relative path. It also has access to the builder user's home folder. All this setup can take place after code checkout, but before the build script is invoked.

This leaves one additional failure mode: a race condition between two builds running at the same time. In practice, this won't be a problem for us because - while we always have two or three build containers running - the are configured to only accept one job at a time.

dontlaugh added a commit to dontlaugh/doc-website that referenced this issue Jun 1, 2024
Our goal is to create a persistent cache of svg files between builds.

When a CI agent is first created, the cache will populate the svg
files in the build agent's home directory. Subsequent builds should
be faster. Refs Raku#393
dontlaugh added a commit that referenced this issue Jun 1, 2024
Our goal is to create a persistent cache of svg files between builds.

When a CI agent is first created, the cache will populate the svg
files in the build agent's home directory. Subsequent builds should
be faster. Refs #393
@finanalyst
Copy link
Collaborator Author

@dontlaugh The approach you suggest is another way of doing it. However, the symlink to the /home/builder/cache must be made before the website is built.
It also means that Callable to manage the typegraphs does not need to be changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Building the site
Projects
None yet
Development

No branches or pull requests

2 participants