This document walks you through
- what kind of tests we have in Gardener
- how to run each of them
- what purpose each kind of test serves
- how to best write tests that are correct, stable, fast and maintainable
- how to debug tests that are not working as expected
The document is aimed towards developers that want to contribute code and need to write tests, as well as maintainers and reviewers that review test code. It serves as a common guide that we commit to follow in our project to ensure consistency in our tests, good coverage for high confidence and good maintainability.
The guidelines are not meant to be absolute rules. Always apply common sense and adapt the guideline if it doesn't make much sense for some cases. If in doubt, don't hesitate to ask questions during PR review (as an author but also as a reviewer). Add new learnings as soon as we make them!
Generally speaking, tests are a strict requirement for contributing new code. If you touch code that is currently untested, you need to add tests for the new cases that you introduce as a minimum. Ideally though, you would add the missing test cases for the current code as well (boy scout rule -- "always leave the campground cleaner than you found it").
- we follow BDD (behavior-driven development) testing principles and use Ginkgo along with Gomega
- make sure to check out their extensive guides for more information and how to best leverage all of their features
- use
By
to structure test cases with multiple steps, so that steps are easy to follow in the logs: example test - call
defer GinkgoRecover()
if making assertions in goroutines: doc, example test - use
DeferCleanup
instead of cleaning up manually (or use custom coding from the test framework): example test, example testDeferCleanup
makes sure to run the cleanup code in the right point in time, e.g., aDeferCleanup
added inBeforeEach
is executed withAfterEach
- test failures should point to an exact location, so that failures in CI aren't too difficult to debug/fix
- use
ExpectWithOffset
for making assertions in helper funcs likeexpectSomethingWasCreated
: example test - make sure to add additional descriptions to Gomega matchers if necessary (e.g. in a loop): example test
- use
- introduce helper functions for assertions to make test more readable where applicable: example test
- introduce custom matchers to make tests more readable where applicable: example matcher
- don't rely on accurate timing of
time.Sleep
and friends- if doing so, CPU throttling in CI will make tests flaky, example flake
- use fake clocks instead, example PR
- use the same client schemes that are also used by production code to avoid subtle bugs/regressions: example PR, production schemes, usage in test
- make sure, your test is actually asserting the right thing and it doesn't pass if the exact bug is introduced that you want to prevent
- use specific error matchers instead of asserting any error has happened, make sure that the corresponding branch in the code is tested, e.g., prefer
over
Expect(err).To(MatchError("foo"))
Expect(err).To(HaveOccurred())
- if you're unsure about your test's behavior, attaching the debugger can sometimes be helpful to make sure your test is correct
- use specific error matchers instead of asserting any error has happened, make sure that the corresponding branch in the code is tested, e.g., prefer
- about overwriting global variables
- this is a common pattern (or hack?) in go for faking calls to external functions
- however, this can lead to races, when the global variable is used from a goroutine (e.g., the function is called)
- alternatively, set fields on structs (passed via parameter or set directly): this is not racy, as struct values are typically (and should be) only used for a single test case
- alternative to dealing with function variables and fields:
- add an interface, which your code depends on
- write a fake and a real implementation (similar to
clock.Clock.Sleep
) - the real implementation calls the actual function (
clock.RealClock.Sleep
callstime.Sleep
) - the fake implementation does whatever you want it to do for your test (
clock.FakeClock.Sleep
waits until the test code advanced the time)
- use constants in test code with care
- typically, you should not use constants from the same package as the tested code, instead use literals
- if the constant value is changed, tests using the constant will still pass, although the "specification" is not fulfilled anymore
- there are cases where it's fine to use constants, but keep this caveat in mind when doing so
- creating sample data for tests can be a high effort
- if valuable, add a package for generating common sample data, e.g. Shoot/Cluster objects
- make use of the
testdata
directory for storing arbitrary sample data needed by tests (helm charts, YAML manifests, etc.), example PR- From https://pkg.go.dev/cmd/go/internal/test:
The go tool will ignore a directory named "testdata", making it available to hold ancillary data needed by the tests.
- From https://pkg.go.dev/cmd/go/internal/test:
Run all unit tests:
make test
Run all unit tests with test coverage:
make test-cov
open test.coverage.html
make test-cov-clean
Run unit tests of specific packages:
# run with same settings like in CI (race dector, timeout, ...)
./hack/test.sh ./pkg/resourcemanager/controller/... ./pkg/utils/secrets/...
# freestyle
go test ./pkg/resourcemanager/controller/... ./pkg/utils/secrets/...
ginkgo run ./pkg/resourcemanager/controller/... ./pkg/utils/secrets/...
Use ginkgo to focus on (a set of) test specs via code or via CLI flags. Remember to unfocus specs before contributing code, otherwise your PR tests will fail.
$ ginkgo run --focus "should delete the unused resources" ./pkg/resourcemanager/controller/garbagecollector
...
Will run 1 of 3 specs
SS•
Ran 1 of 3 Specs in 0.003 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 2 Skipped
PASS
Use ginkgo to run tests until they fail:
$ ginkgo run --until-it-fails ./pkg/resourcemanager/controller/garbagecollector
...
Ran 3 of 3 Specs in 0.004 seconds
SUCCESS! -- 3 Passed | 0 Failed | 0 Pending | 0 Skipped
PASS
All tests passed...
Will keep running them until they fail.
This was attempt #58
No, seriously... you can probably stop now.
Use the stress
tool for deflaking tests that fail sporadically in CI, e.g., due resource contention (CPU throttling):
# get the stress tool
go install golang.org/x/tools/cmd/stress@latest
# build a test binary
ginkgo build ./pkg/resourcemanager/controller/garbagecollector
# alternatively
go test -c ./pkg/resourcemanager/controller/garbagecollector
# run the test in parallel and report any failures
$ stress -p 16 ./pkg/resourcemanager/controller/garbagecollector/garbagecollector.test -ginkgo.focus "should delete the unused resources"
5s: 1077 runs so far, 0 failures
10s: 2160 runs so far, 0 failures
stress
will output a path to a file containing the full failure message, when a test run fails.
- unit tests prove correctness of a single unit according to the specification of its interface
- think: is the unit that I introduced doing what it is supposed to do for all cases?
- unit tests protect against regressions caused by adding new functionality to or refactoring of a single unit
- think: is the unit that was introduced earlier (by someone else) and that I changed still doing what it was supposed to do for all cases?
- example units: functions (conversion, defaulting, validation, helpers), structs (helpers, basic building blocks like the Secrets Manager), predicates, event handlers
- for these purposes, unit tests need to cover all important cases of input for a single unit and cover edge cases / negative paths as well (e.g., errors)
- because of the possible high dimensionality of test input, unit tests need to be fast to execute: individual test cases should not take more than a few seconds, test suites not more than 2 minutes
- fuzzing can be used as a technique in addition to usual test cases for covering edge cases
- test coverage can be used as a tool during test development for covering all cases of a unit
- however, test coverage data can be a false safety net
- full line coverage doesn't mean you have covered all cases of valid input
- we don't have strict requirements for test coverage, as it doesn't necessarily yield the desired outcome
- unit tests should not test too large components, e.g. entire controller
Reconcile
functions- if a function/component does many steps, it's probably better to split it up into multiple functions/components that can be unit tested individually
- there might be special cases for very small
Reconcile
functions - if there are a lot of edge cases, extract dedicated functions that cover them and use unit tests to test them
- usual-sized controllers should rather be tested in integration tests
- individual parts (e.g. helper functions) should still be tested in unit test for covering all cases, though
- unit tests are especially easy to run with a debugger and can help in understanding concrete behavior of components
- for the sake of execution speed, fake expensive calls/operations, e.g. secret generation: example test
- generally, prefer fakes over mocks, e.g., use controller-runtime fake client over mock clients
- mocks decrease maintainability because they expect the tested component to follow a certain way to reach the desired goal (e.g., call specific functions with particular arguments), example consequence
- generally, fakes should be used in "result-oriented" test code (e.g., that a certain object was labelled, but the test doesn't care if it was via patch or update as both a valid ways to reach the desired goal)
- although rare, there are valid use cases for mocks, e.g. if the following aspects are important for correctness:
- asserting that an exact function is called
- asserting that functions are called in a specific order
- asserting that exact parameters/values/... are passed
- asserting that a certain function was not called
- many of these can also be verified with fakes, although mocks might be simpler
- only use mocks if the tested code directly calls the mock; never if the tested code only calls the mock indirectly (e.g., through a helper package/function)
- keep in mind the maintenance implications of using mocks:
- can you make a valid non-behavioral change in the code without breaking the test or dependent tests?
- it's valid to mix fakes and mocks in the same test or between test cases
- generally, use the go test package, i.e., declare
package <production_package>_test
- helps in avoiding cyclic dependencies between production, test and helper packages
- also forces you to distinguish between the public (exported) API surface of your code and internal state that might not be of interest to tests
- it might be valid to use the same package as the tested code if you want to test unexported functions
- alternatively, an
internal
package can be used to host "internal" helpers: example package
- alternatively, an
- helpers can also be exported if no one is supposed to import the containing package (e.g. controller package)
Integration tests in Gardener use the sigs.k8s.io/controller-runtime/pkg/envtest
package.
It sets up a temporary control plane (etcd + kube-apiserver) and runs the test against it.
Historically, test machinery tests have also been called "integration tests". However, test machinery does not perform integration testing but rather executes a form of end-to-end tests against a real landscape. Hence, we tried to sharpen the terminology that we use to distinguish between "real" integration tests and test machinery tests but you might still find "integration tests" referring to test machinery tests in old issues or outdated documents.
The test-integration
make rule prepares the environment automatically by downloading the respective binaries (if not yet present) and sets the necessary environment variables.
make test-integration
If you want to run a specific set of integration tests, you can also execute them using ./hack/test-integration.sh
directly instead of using the test-integration
rule. For example:
./hack/test-integration.sh ./test/integration/resourcemanager/tokenrequestor
The script takes care of preparing the environment for you.
If you want to execute the test suites directly via go test
or ginkgo
, you have to point the KUBEBUILDER_ASSETS
environment variable to the path that contains the etcd and kube-apiserver binaries. Alternatively, you can install the binaries to /usr/local/kubebuilder/bin
.
You can configure envtest to use an existing cluster instead of starting a temporary control plane for your test.
This can be helpful for debugging integration tests, because you can easily inspect what is going on in your test cluster with kubectl
.
For example:
make kind-up
export KUBECONFIG=$PWD/example/gardener-local/kind/kubeconfig
export USE_EXISTING_CLUSTER=true
# run test with verbose output
./hack/test-integration.sh -v ./test/integration/resourcemanager/health -ginkgo.v
# watch test objects
k get managedresource -A -w
Similar to debugging unit tests, the stress
tool can help hunting flakes in integration tests.
Though, you might need to run less tests in parallel though (specified via -p
) and have a bit more patience.
- integration tests prove that multiple units are correctly integrated into a fully-functional component of the system
- example component with multiple units: a controller with its reconciler, watches, predicates, event handlers, queues, etc.
- integration tests set up a full component (including used libraries) and run it against a test environment close to the actual setup
- e.g., start controllers against a real Kubernetes control plane to catch bugs that can only happen when talking to a real API server
- integration tests are generally more expensive to run (e.g., in terms of execution time)
- integration tests should not cover each and every detailed case
- rather cover a good portion of the "usual" cases that components will face during normal operation (positive and negative test cases)
- but don't cover all failure cases or all cases of predicates -> they should be covered in unit tests already
- generally, not supposed to "generate test coverage" but to provide confidence that components work well
- as integration tests typically test only one component (or a cohesive set of components) isolated from others, they cannot catch bugs that occur when multiple controllers interact (could be discovered by e2e tests, though)
- rule of thumb: a new integration tests should be added for each new controller (an integration test doesn't replace unit tests though)
- make sure to have a clean test environment on both test suite and test case level:
- set up dedicated test environments (envtest instances) per test suite
- use dedicated namespaces per test suite, use
GenerateName
with a test-specific prefix: example test- this allows running a test in parallel against the same existing cluster for deflaking and stress testing: example PR
- use dedicated test resources for each test case, use
GenerateName
(example test) or checksum ofCurrentSpecReport().LeafNodeLocation.String()
(example test)- this avoids cascading failures of test cases and distracting from the actual root failure
- don't tolerate already existing resources (~dirty test environment), code smell: ignoring already exist errors
- don't use a cached client in test code (e.g., the one from a controller-runtime manager), always construct a dedicated test client (uncached): example test
- use asynchronous assertions:
Eventually
andConsistently
- never
Expect
anything to happen synchronously (immediately) - don't use retry or wait until functions -> use
Eventually
,Consistently
instead: example test - this allows to override the interval/timeout values from outside instead of hard-coding this in the test (see
hack/test-integration.sh
): example PR - beware of the default
Eventually
/Consistently
timeouts / poll intervals: docs - don't set custom (high) timeouts and intervals in test code: example PR
- instead, shorten sync period of controllers, overwrite intervals of the tested code, or use fake clocks: example test
- pass
g Gomega
toEventually
/Consistently
and useg.Expect
in it: docs, example test, example PR - don't forget to call
{Eventually,Consistently}.Should()
, otherwise the assertions always silently succeeds without errors: onsi/gomega#561
- never
We run a suite of e2e tests on every pull request and periodically on the master
branch.
It uses a KinD cluster and skaffold to boostrap a full installation of Gardener based on the current revision, including provider-local.
This allows us to run e2e tests in an isolated test environment and fully locally without any infrastructure interaction.
The tests perform a set of operations on Shoot clusters, e.g. creating, deleting, hibernating and waking up.
These tests are executed in our prow instance at prow.gardener.cloud, see job definition and job history.
You can also run these tests on your development machine, using the following commands:
make kind-up
export KUBECONFIG=$PWD/example/gardener-local/kind/kubeconfig
make gardener-up
make test-e2e-local # alternatively: make test-e2e-local-simple
If you want to run a specific set of e2e test cases, you can also execute them using ./hack/test-e2e-local.sh
directly in combination with ginkgo label filters. For example:
./hack/test-e2e-local.sh --label-filter "Shoot && credentials-rotation"
If you want to use an existing shoot instead of creating a new one for the test case and deleting it afterwards, you can specify the existing shoot via the following flags. This can be useful to speed of the development of e2e tests.
./hack/test-e2e-local.sh --label-filter "Shoot && credentials-rotation" -- --project-namespace=garden-local --existing-shoot-name=local
Also see: developing Gardener locally and deploying Gardener locally.
When debugging e2e test failures in CI, logs of the cluster components can be very helpful.
Our e2e test jobs export logs of all containers running in the kind cluster to prow's artifacts storage.
You can find them by clicking the Artifacts
link in the top bar in prow's job view and navigating to artifacts
.
This directory will contain all cluster component logs grouped by node.
Pull all artifacts using gsutil
for searching and filtering the logs locally (use the path displayed in the artifacts view):
gsutil cp -r gs://gardener-prow/pr-logs/pull/gardener_gardener/6136/pull-gardener-e2e-kind/1542030416616099840/artifacts/gardener-local-control-plane /tmp
- e2e tests provide a high level of confidence that our code runs as expected by users when deployed to production
- they are supposed to catch bugs resulting from interaction between multiple components
- test cases should be as close as possible to real usage by endusers
- should test "from the perspective of the user" (or operator)
- example: I create a Shoot and expect to be able to connect to it via the provided kubeconfig
- accordingly, don't assert details of the system
- e.g., the user also wouldn't expect that there is a kube-apiserver deployment in the seed, they rather expect that they can talk to it no matter how it is deployed
- only assert details of the system if the tested feature is not fully visible to the end-user and there is no other way of ensuring that the feature works reliably
- e.g., the Shoot CA rotation is not fully visible to the user but is assertable by looking at the secrets in the Seed.
- pro: can be executed by developers and users without any real infrastructure (provider-local)
- con: they currently cannot be executed with real infrastructure (e.g., provider-aws), we will work on this as part of #6016
- keep in mind that the tested scenario is still artificial in a sense of using default configuration, only a few objects, only a few config/settings combinations are covered
- we will never be able to cover the full "test matrix" and this should not be our goal
- bugs will still be released and will still happen in production; we can't avoid it
- instead, we should add test cases for preventing bugs in features or settings that were frequently regressed: example PR
- usually e2e tests cover the "straight-forward cases"
- however, negative test cases can also be included, especially if they are important from the user's perspective
- always wrap API calls and similar things in
Eventually
blocks: example test- at this point, we are pretty much working with a distributed system and failures can happen anytime
- wrapping calls in
Eventually
makes tests more stable and more realistic (usually, you wouldn't call the system broken if a single API call fails because of a short connectivity issue)
- most of the points from writing integration tests are relevant for e2e tests as well (especially the points about asynchronous assertions)
- in contrast to integration tests, in e2e tests, it might make sense to specify higher timeouts for
Eventually
calls, e.g., when waiting for aShoot
to be reconciled- generally, try to use the default settings for
Eventually
specified via the environment variables - only set higher timeouts if waiting for long-running reconciliations to be finished
- generally, try to use the default settings for
Please see Test Machinery Tests.
- test machinery tests have to be executed against full-blown Gardener installations
- they can provide a very high level of confidence that an installation is functional in its current state, this includes: all Gardener components, Extensions, the used Cloud Infrastructure, all relevant settings/configuration
- this brings the following benefits:
- they test more realistic scenarios than e2e tests (real configuration, real infrastructure, etc.)
- tests run "where the users are"
- however, this also brings significant drawbacks:
- tests are difficult to develop and maintain
- tests require a full Gardener installation and cannot be executed in CI (on PR-level or against master)
- tests require real infrastructure (think cloud provider credentials, cost)
- using
TestDefinitions
under.test-defs
requires a full test machinery installation - accordingly, tests are heavyweight and expensive to run
- testing against real infrastructure can cause flakes sometimes (e.g., in outage situations)
- failures are hard to debug, because clusters are deleted after the test (for obvious cost reasons)
- bugs can only be caught, once it's "too late", i.e., when code is merged and deployed
- today, test machinery tests cover a bigger "test matrix" (e.g., Shoot creation across infrastructures, kubernetes versions, machine image versions, etc.)
- test machinery also runs Kubernetes conformance tests
- however, because of the listed drawbacks, we should rather focus on augmenting our e2e tests, as we can run them locally and in CI in order to catch bugs before they get merged
- it's still a good idea to add test machinery tests if a feature needs to be tested that is depending on some installation-specific configuration
- generally speaking, most points from writing integration tests and writing e2e tests apply here as well
- however, test machinery tests contain a lot of technical debt and existing code doesn't follow these best practices
- as test machinery tests are out of our general focus, we don't intend on reworking the tests soon or providing more guidance on how to write new ones
- manual tests can be useful when the cost of trying to automatically test certain functionality are too high
- useful for PR verification, if a reviewer wants to verify that all cases are properly tested by automated tests
- currently, it's the simplest option for testing upgrade scenarios
- e.g. migration coding is probably best tested manually, as it's a high effort to write an automated test for little benefit
- obviously, the need for manual tests should be kept at a bare minimum
- instead, we should add e2e tests wherever sensible/valuable
- we want to implement some form of general upgrade tests as part of #6016