From 361b0e79bbdf95af379cd6d2bd2cf89b9bfbaa8f Mon Sep 17 00:00:00 2001 From: Patrick Ohly Date: Wed, 27 Nov 2024 09:43:31 +0100 Subject: [PATCH] testing strategy: add policy for presubmits This was motivated in part by https://github.com/kubernetes/test-infra/pull/33463#issuecomment-2348289131 and is part of an effort to document best practices. The part about blocking presubmits and running them always is https://kubernetes.slack.com/archives/C2C40FMNF/p1734418617113169?thread_ts=1734417601.687079&cid=C2C40FMNF --- .../devel/sig-testing/testing-strategy.md | 49 +++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/contributors/devel/sig-testing/testing-strategy.md b/contributors/devel/sig-testing/testing-strategy.md index 82f50f51f4f..91fe38f4af9 100644 --- a/contributors/devel/sig-testing/testing-strategy.md +++ b/contributors/devel/sig-testing/testing-strategy.md @@ -26,6 +26,55 @@ The Kubernetes job uses [prow](https://prow.k8s.io) to implement the CI system. - **Postsubmit:** Runs after code is merged. Useful for building artifacts. - **Periodic:** Runs at scheduled intervals. Ideal for monitoring trends and catching regressions. +#### Presubmit jobs + +Blocking presubmit jobs must always run. If they didn't, it could happen that +a pull request where they didn't run gets merged and introduces a regression +which breaks such a job. Then when it runs in a different pull request, that +pull request cannot be merged, even if that was desirable, until the regression +is fixed. + +Usually, non-blocking jobs don't run by default. The `/test` command has to be +used explicitly for such informing jobs. It is possible to configure them so that they +[run automatically when certain paths are modified](https://github.com/kubernetes/test-infra/blob/ee70308f09c10f7cd933c26c98acc7ebf785d436/config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml#L3201-L3202). + +Non-blocking jobs cannot detect all regressions. A test flake might succeed +when tested only once during presubmit. When defining the path trigger, it's +impossible to list everything that might cause a need to run tests +(e.g. tool changes, updates in packages that a feature depends on). Therefore +it is required to have a periodic job which runs the same tests regularly. + +The advantage of also having a non-blocking job is that problems get discovered +sooner. Without it, maintainers are forced to diagnose regressions in a +periodic job and then have to ping the contributor who caused the problem. If +that contributor is unresponsive, maintainers may have to fix the problem +themselves. Instead, the burden is on the contributor whose pull request fails +the tests. If they are unresponsive, their change doesn't get merged and +there's no regression. + +The advantage of running a non-blocking job automatically for some changes is +that reviewers don't need to remember which jobs currently should be tested +before merging and that the results are available immediately when a reviewer +looks at a pull request for the first time, assuming that the contributor is a +community member and tests run automatically. + +> :warning: **Warning:** A non-blocking job that fails confuses other +> contributors who are not familiar with the job or the failures. If it runs +> too often, it wastes CI resources. + +To avoid those negative consequences for the project, the guidelines for +setting up such a job are: + +* The job owners are responsive and react to problems with the job. +* The job must have a low failure rate to avoid confusion in drive-by pull requests. +* The importance of the feature must justify the extra CI resources (depends + on how often it gets triggered). +* The `run_if_changed` regular expression must be narrow enough that + the job doesn't run for unrelated changes. A good practice is to + limit it to code changes, for example with: + + /(directory_a|directory_b)/.*\.go + #### SIG Release Blocking and Informing jobs SIG Release maintains two sets of jobs that decide whether the release is