Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CI performance #133

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

joneshf
Copy link
Owner

@joneshf joneshf commented Apr 9, 2023

CI is back to being slow again. Since removing a bunch of parallelism in
#126, things
take a long time to run. Some of this is expected when there are large
changes. However, when adding a new Data Source/Resource, it seems like
the entire test suite runs again from scratch.

It seems like there should be some kind of caching in the tests that
isn't showing up, and it looks like it's due to the change in build
flags/environment variables used to run integration tests. Because we
use the same cache for all the tests, we end up doing something like:

  1. Run the unit tests with one set of build flags/environment variables.
  2. It creates a cached test run.
  3. Run the integration tests a different set of build flags/environment
    variables. This causes the cached tests from the unit tests to be
    invalidated.
  4. It creates a new cached test run.
  5. In a future commit, run the unit tests with the original set of build
    flags/environment variables. This causes the cached tests from the
    integration tests to be invalidated.

Because we're always alternating between unit tests and integration
tests, the cached tests are never being used.

In order to address that, we're splitting the test jobs between
integration test and unit tests. We make a different cache for each job,
so they can reuse their cached tests when possible. This should allow
the tests to only run what has changed between commits, instead of
needing to re-run everything, all the time.

joneshf added 7 commits April 9, 2023 10:14
CI is back to being slow again. Since removing a bunch of parallelism in
#126, things
take a long time to run. Some of this is expected when there are large
changes. However, when adding a new Data Source/Resource, it seems like
the entire test suite runs again from scratch.

It seems like there should be some kind of caching in the tests that
isn't showing up, and it looks like it's due to the change in build
flags/environment variables used to run integration tests. Because we
use the same cache for all the tests, we end up doing something like:
1. Run the unit tests with one set of build flags/environment variables.
2. It creates a cached test run.
3. Run the integration tests a different set of build flags/environment
    variables. This causes the cached tests from the unit tests to be
    invalidated.
4. It creates a new cached test run.
5. In a future commit, run the unit tests with the original set of build
    flags/environment variables. This causes the cached tests from the
    integration tests to be invalidated.

Because we're always alternating between unit tests and integration
tests, the cached tests are never being used.

In order to address that, we're splitting the test jobs between
integration test and unit tests. We make a different cache for each job,
so they can reuse their cached tests when possible. This should allow
the tests to only run what has changed between commits, instead of
needing to re-run everything, all the time.
We used the incorrect `make` rule. We correct that, and also make the
job names reflect the `make` rule. That's what they were doing before.
It keeps it a little easier to understand.
We mostly removed these fallback keys so we wouldn't get a hit on the
new jobs. Now that we've got caches for the different jobs, we can add
back the fallback keys.
We want the caches to update whenever there are changes to the files.
This way, when we alter any of the tests, they should update the cache
with their latest result. We do this by appending a hash of the go
files to the cache key.

We still want to fallback to the nearest cache that works though, so we
can bring it any test results that might be successfully cached, and
only worry about the new stuff. We do this by making sure one of the
restore keys drops the appended hash.
It's not clear why we're re-running the tests only for the acceptance
tests. But, this is the main thing that's making things flaky. Maybe we
can get some insights by debugging the cache decisions.
Not sure this is going to provide more useful information, but maybe it
will surface something.
We weren't doing a recursive search of Go files, just whatever was at
the top-level. We want to make sure we get the files within directories
as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant