Improve CI performance #133

joneshf · 2023-04-09T17:14:27Z

CI is back to being slow again. Since removing a bunch of parallelism in
#126, things
take a long time to run. Some of this is expected when there are large
changes. However, when adding a new Data Source/Resource, it seems like
the entire test suite runs again from scratch.

It seems like there should be some kind of caching in the tests that
isn't showing up, and it looks like it's due to the change in build
flags/environment variables used to run integration tests. Because we
use the same cache for all the tests, we end up doing something like:

Run the unit tests with one set of build flags/environment variables.
It creates a cached test run.
Run the integration tests a different set of build flags/environment
variables. This causes the cached tests from the unit tests to be
invalidated.
It creates a new cached test run.
In a future commit, run the unit tests with the original set of build
flags/environment variables. This causes the cached tests from the
integration tests to be invalidated.

Because we're always alternating between unit tests and integration
tests, the cached tests are never being used.

In order to address that, we're splitting the test jobs between
integration test and unit tests. We make a different cache for each job,
so they can reuse their cached tests when possible. This should allow
the tests to only run what has changed between commits, instead of
needing to re-run everything, all the time.

CI is back to being slow again. Since removing a bunch of parallelism in #126, things take a long time to run. Some of this is expected when there are large changes. However, when adding a new Data Source/Resource, it seems like the entire test suite runs again from scratch. It seems like there should be some kind of caching in the tests that isn't showing up, and it looks like it's due to the change in build flags/environment variables used to run integration tests. Because we use the same cache for all the tests, we end up doing something like: 1. Run the unit tests with one set of build flags/environment variables. 2. It creates a cached test run. 3. Run the integration tests a different set of build flags/environment variables. This causes the cached tests from the unit tests to be invalidated. 4. It creates a new cached test run. 5. In a future commit, run the unit tests with the original set of build flags/environment variables. This causes the cached tests from the integration tests to be invalidated. Because we're always alternating between unit tests and integration tests, the cached tests are never being used. In order to address that, we're splitting the test jobs between integration test and unit tests. We make a different cache for each job, so they can reuse their cached tests when possible. This should allow the tests to only run what has changed between commits, instead of needing to re-run everything, all the time.

We used the incorrect `make` rule. We correct that, and also make the job names reflect the `make` rule. That's what they were doing before. It keeps it a little easier to understand.

We mostly removed these fallback keys so we wouldn't get a hit on the new jobs. Now that we've got caches for the different jobs, we can add back the fallback keys.

We want the caches to update whenever there are changes to the files. This way, when we alter any of the tests, they should update the cache with their latest result. We do this by appending a hash of the go files to the cache key. We still want to fallback to the nearest cache that works though, so we can bring it any test results that might be successfully cached, and only worry about the new stuff. We do this by making sure one of the restore keys drops the appended hash.

It's not clear why we're re-running the tests only for the acceptance tests. But, this is the main thing that's making things flaky. Maybe we can get some insights by debugging the cache decisions.

Not sure this is going to provide more useful information, but maybe it will surface something.

We weren't doing a recursive search of Go files, just whatever was at the top-level. We want to make sure we get the files within directories as well.

joneshf added 7 commits April 9, 2023 10:14

Fix jobs

56a0c90

We used the incorrect `make` rule. We correct that, and also make the job names reflect the `make` rule. That's what they were doing before. It keeps it a little easier to understand.

Provide cache fallback keys

7d4846c

We mostly removed these fallback keys so we wouldn't get a hit on the new jobs. Now that we've got caches for the different jobs, we can add back the fallback keys.

Debug the test cache decision

ad459c1

It's not clear why we're re-running the tests only for the acceptance tests. But, this is the main thing that's making things flaky. Maybe we can get some insights by debugging the cache decisions.

Also debug the cache hashes

579b962

Not sure this is going to provide more useful information, but maybe it will surface something.

Hash all of the Go files

f25a7b7

We weren't doing a recursive search of Go files, just whatever was at the top-level. We want to make sure we get the files within directories as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve CI performance #133

Improve CI performance #133

joneshf commented Apr 9, 2023

Improve CI performance #133

Are you sure you want to change the base?

Improve CI performance #133

Conversation

joneshf commented Apr 9, 2023