Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Fleet in Rancher on self-hosted runner #2804

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

weyfonk
Copy link
Contributor

@weyfonk weyfonk commented Sep 4, 2024

This creates a new Test Fleet in Rancher workflow to install Fleet through the latest Rancher. An example run can be found here.
This workflow can be called from this one to test any given Fleet commit against Rancher through multi-cluster tests.

Open points

  • since this workflow is scheduled to run at 1 PM every day, would we want to switch the scheduling to run this one instead, which would effectively test Fleet's latest main commit against the latest Rancher release every day?
  • if not, which defaults do we want to set for chart branches and repositories, which would be sensible for as many testing use cases as possible? Or should those values be computed to, say, fetch the latest charts branch name according to default configuration of the latest Rancher release?
  • do we want to run this new workflow against each pull request? It seems to last around 10 min, including generating test Fleet charts.
  • Each run of this workflows creates a new fleetrepoci/charts branch, which is only useful for as long as Docker images pointed to by test charts exist. In practice, this means only an hour, after which those branches could, and probably should, be deleted. Do we want to automate that deletion to prevent cluttering of fleetrepoci/charts with obsolete test branches?

Possible improvements

  • Use this to run single-cluster end-to-end tests, including those requiring test infrastructure.

Refers to #1640.

@weyfonk weyfonk requested a review from a team as a code owner September 4, 2024 14:48
dev/test-in-rancher Outdated Show resolved Hide resolved
dev/test-in-rancher Outdated Show resolved Hide resolved
.github/workflows/e2e-rancher-upgrade-fleet-to-head-ci.yml Outdated Show resolved Hide resolved
echo -e "4\n" | rancher login "https://$public_hostname" --token "$token" --skip-verify

rancher clusters create second --import
until rancher cluster ls --format json | jq -r 'select(.Name=="second") | .ID' | grep -Eq "c-[a-z0-9]" ; do sleep 1; done
id=$( rancher cluster ls --format json | jq -r 'select(.Name=="second") | .ID' )

kubectl config use-context "$cluster_downstream"
rancher cluster import "$id"
kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weyfonk weyfonk force-pushed the self-hosted-runner-fleet-in-rancher branch 2 times, most recently from 2d9310e to 683ec88 Compare September 20, 2024 07:39
@weyfonk weyfonk marked this pull request as draft September 24, 2024 13:37
This is the basis for being able to test any given Fleet commit in
Rancher, installing the latter through Helm.
This simplifies Rancher installation, preventing a costly local build
and simply making use of existing configuration options to override
the Fleet version to install from a custom repository and branch.

The Rancher Docker image to use is hard-coded for now.
This is an attempt to use Rancher's org-wide hosted runners to test
Fleet within Rancher, instead of dealing with GCP.
This may help prevent unknown authority errors when installing the
Ginkgo CLI.
That image comes with CA certificates, curl and tar installed.
Images are already built when releasing Fleet charts against a test
charts repository.
This makes use of a dedicated step to install remaining dependencies.
Do we really need a dedicated VM or self-hosted runner for this?
Reusing the same setup as Fleet's multi-cluster tests to verify it.
Installing Rancher should directly take care of installing our test
Fleet version.
This prevents issues about package `helm` not being found in Ubuntu
repositories.
This could be reused in CI, after a few improvements.
`rancher/fleet` is now approved to use such runners, with help from EIO.
Registering downstream clusters with Rancher requires the `rancher` CLI.
This should help troubleshoot failures with `Process completed with exit
code 1`.
This may eliminate errors with downstream cluster registration.
This eases troubleshooting and enables testing Fleet against existing
Rancher releases.
This is not needed when installing Rancher from Helm instead of building
it.
Old parameter `installCRDs` is deprecated.
The Rancher CLI may output an empty command for a bit, leading
`register-downstream-clusters.sh` to fail. Instead of trying to reverse
engineer why that might be, we simply run `rancher cluster import`
repeatedly until the returned command is non-empty.
This could help us understand why downstream cluster registration fails
in CI although it works locally.
When installing Rancher through Helm:
* the `CATTLE_SERVER_URL` needs to be set to the same value as `hostname`
* TLS mode must be set to `system-store`, to prevent cert-related errors
  when running a Fleet agent in a downstream cluster
This waits for the upstream cluster to be ready, preventing an empty IP
from being set.
Environment variables are not necessary, as a dedicated Helm value
exists.
This moves `test-in-rancher` to the `dev` scripts directory, and briefly
explains how to use it.
A newline follows the initial `-`, as in other workflows.
The script does not actually test anything, and is now named
consistently with other scripts living in the `dev/` folder.
The script would not manage a cluster's lifecycle beyond its creation
anyway, and doing so would be harder to automate. It is therefore left
out of the script's scope.
This leaves the original workflow in place, upgrading Fleet in Rancher.
That block is not necessary, as tests are run against clusters
afterwards.
This ensures that Fleet examples are validated against Fleet in Rancher.
@weyfonk weyfonk force-pushed the self-hosted-runner-fleet-in-rancher branch from 683ec88 to 7b82940 Compare September 26, 2024 06:46
@weyfonk weyfonk marked this pull request as ready for review September 26, 2024 07:01
That check is only relevant when using a custom charts branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants