Test Fleet in Rancher on self-hosted runner #2804

weyfonk · 2024-09-04T14:48:39Z

This creates a new Test Fleet in Rancher workflow to install Fleet through the latest Rancher. An example run can be found here.
This workflow can be called from this one to test any given Fleet commit against Rancher through multi-cluster tests.

Open points

since this workflow is scheduled to run at 1 PM every day, would we want to switch the scheduling to run this one instead, which would effectively test Fleet's latest main commit against the latest Rancher release every day?
if not, which defaults do we want to set for chart branches and repositories, which would be sensible for as many testing use cases as possible? Or should those values be computed to, say, fetch the latest charts branch name according to default configuration of the latest Rancher release?
do we want to run this new workflow against each pull request? It seems to last around 10 min, including generating test Fleet charts.
Each run of this workflows creates a new fleetrepoci/charts branch, which is only useful for as long as Docker images pointed to by test charts exist. In practice, this means only an hour, after which those branches could, and probably should, be deleted. Do we want to automate that deletion to prevent cluttering of fleetrepoci/charts with obsolete test branches?

Possible improvements

Use this to run single-cluster end-to-end tests, including those requiring test infrastructure.

Refers to #1640.

dev/test-in-rancher

.github/workflows/e2e-rancher-upgrade-fleet-to-head-ci.yml

manno · 2024-09-17T13:50:42Z

.github/scripts/register-downstream-clusters.sh

 echo -e "4\n" | rancher login "https://$public_hostname" --token "$token" --skip-verify

 rancher clusters create second --import
 until rancher cluster ls --format json | jq -r 'select(.Name=="second") | .ID' | grep -Eq "c-[a-z0-9]" ; do sleep 1; done
 id=$( rancher cluster ls --format json | jq -r 'select(.Name=="second") | .ID' )

 kubectl config use-context "$cluster_downstream"
-rancher cluster import "$id"
+kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $user


Interesting, I wonder if it would help to pick an admin user:

manno/fleet-dev-tools@f53ec4c#diff-713765b9c3a7b1fdfcfe622eba2300d80eeff5d04089de16342b77072e83efa3R23

This is the basis for being able to test any given Fleet commit in Rancher, installing the latter through Helm.

This simplifies Rancher installation, preventing a costly local build and simply making use of existing configuration options to override the Fleet version to install from a custom repository and branch. The Rancher Docker image to use is hard-coded for now.

This is an attempt to use Rancher's org-wide hosted runners to test Fleet within Rancher, instead of dealing with GCP.

This may help prevent unknown authority errors when installing the Ginkgo CLI.

That image comes with CA certificates, curl and tar installed.

Images are already built when releasing Fleet charts against a test charts repository.

This makes use of a dedicated step to install remaining dependencies.

Do we really need a dedicated VM or self-hosted runner for this? Reusing the same setup as Fleet's multi-cluster tests to verify it.

Installing Rancher should directly take care of installing our test Fleet version.

This should fix permissions issues.

This prevents issues about package `helm` not being found in Ubuntu repositories.

This could be reused in CI, after a few improvements.

`rancher/fleet` is now approved to use such runners, with help from EIO.

Registering downstream clusters with Rancher requires the `rancher` CLI.

This should help troubleshoot failures with `Process completed with exit code 1`.

This enables downstream cluster registration to succeed, as specified in the official Rancher docs [1]. [1]: https://ranchermanager.docs.rancher.com/v2.0-v2.4/how-to-guides/new-user-guides/kubernetes-clusters-in-rancher-setup/import-existing-clusters#prerequisites

This may eliminate errors with downstream cluster registration.

This eases troubleshooting and enables testing Fleet against existing Rancher releases.

This is not needed when installing Rancher from Helm instead of building it.

Old parameter `installCRDs` is deprecated.

The Rancher CLI may output an empty command for a bit, leading `register-downstream-clusters.sh` to fail. Instead of trying to reverse engineer why that might be, we simply run `rancher cluster import` repeatedly until the returned command is non-empty.

This could help us understand why downstream cluster registration fails in CI although it works locally.

When installing Rancher through Helm: * the `CATTLE_SERVER_URL` needs to be set to the same value as `hostname` * TLS mode must be set to `system-store`, to prevent cert-related errors when running a Fleet agent in a downstream cluster

This waits for the upstream cluster to be ready, preventing an empty IP from being set.

Environment variables are not necessary, as a dedicated Helm value exists.

This moves `test-in-rancher` to the `dev` scripts directory, and briefly explains how to use it.

A newline follows the initial `-`, as in other workflows.

The script does not actually test anything, and is now named consistently with other scripts living in the `dev/` folder.

The script would not manage a cluster's lifecycle beyond its creation anyway, and doing so would be harder to automate. It is therefore left out of the script's scope.

This leaves the original workflow in place, upgrading Fleet in Rancher.

That block is not necessary, as tests are run against clusters afterwards.

This ensures that Fleet examples are validated against Fleet in Rancher.

That check is only relevant when using a custom charts branch.

weyfonk requested a review from a team as a code owner September 4, 2024 14:48

manno reviewed Sep 17, 2024

View reviewed changes

weyfonk force-pushed the self-hosted-runner-fleet-in-rancher branch 2 times, most recently from 2d9310e to 683ec88 Compare September 20, 2024 07:39

weyfonk marked this pull request as draft September 24, 2024 13:37

weyfonk added 25 commits September 26, 2024 08:38

Fix Fleet upgrade script to install Fleet in Rancher

560e397

This is the basis for being able to test any given Fleet commit in Rancher, installing the latter through Helm.

Install Rancher via Helm

0917e69

This simplifies Rancher installation, preventing a costly local build and simply making use of existing configuration options to override the Fleet version to install from a custom repository and branch. The Rancher Docker image to use is hard-coded for now.

Run Fleet-in-Rancher test workflow on self-hosted runner

b74dff6

This is an attempt to use Rancher's org-wide hosted runners to test Fleet within Rancher, instead of dealing with GCP.

Refresh CA certificates

2f662aa

This may help prevent unknown authority errors when installing the Ginkgo CLI.

Use bci/bci-base:15.6 base image

420cb2a

That image comes with CA certificates, curl and tar installed.

Skip Docker image building

bc142d2

Images are already built when releasing Fleet charts against a test charts repository.

Install helm and kubectl

accdb2c

This makes use of a dedicated step to install remaining dependencies.

Run Fleet-in-Rancher tests against Ubuntu

46d3e28

Do we really need a dedicated VM or self-hosted runner for this? Reusing the same setup as Fleet's multi-cluster tests to verify it.

Remove Fleet update step

7fe4bb0

Installing Rancher should directly take care of installing our test Fleet version.

Install kubectl and other dependencies into ~/.local/bin

da736cc

This should fix permissions issues.

Install helm from script

db1f6e7

This prevents issues about package `helm` not being found in Ubuntu repositories.

Add WIP script for testing Fleet in Rancher

a738a25

This could be reused in CI, after a few improvements.

Use non-containerized self-hosted runner

d067cec

`rancher/fleet` is now approved to use such runners, with help from EIO.

Add rancher bin path to $PATH where needed

afbbc4e

Registering downstream clusters with Rancher requires the `rancher` CLI.

Print shell commands when setting up Rancher

063bfe8

This should help troubleshoot failures with `Process completed with exit code 1`.

Compute public IP instead of hard-coding it

5aa3527

This may eliminate errors with downstream cluster registration.

Enable direct Fleet-in-Rancher workflow call

de9a6d6

This eases troubleshooting and enables testing Fleet against existing Rancher releases.

Skip rancher/rancher checkout

3531931

This is not needed when installing Rancher from Helm instead of building it.

Replace cert-manager CRDs install parameter

092ef4b

Old parameter `installCRDs` is deprecated.

Restore Tmate step for troubleshooting

11a4a1d

This could help us understand why downstream cluster registration fails in CI although it works locally.

Fix downstream cluster registration

6da4914

When installing Rancher through Helm: * the `CATTLE_SERVER_URL` needs to be set to the same value as `hostname` * TLS mode must be set to `system-store`, to prevent cert-related errors when running a Fleet agent in a downstream cluster

Ensure API server IP is available before setting it

8883d22

This waits for the upstream cluster to be ready, preventing an empty IP from being set.

Simplify TLS mode setting

97ca39f

Environment variables are not necessary, as a dedicated Helm value exists.

weyfonk added 7 commits September 26, 2024 08:38

Document testing Fleet in Rancher

2bb8222

This moves `test-in-rancher` to the `dev` scripts directory, and briefly explains how to use it.

Restore consistency in job blocks layout

c0d5cb0

A newline follows the initial `-`, as in other workflows.

Rename Fleet-in-Rancher test script

0ee61e1

The script does not actually test anything, and is now named consistently with other scripts living in the `dev/` folder.

Remove cluster setup from Fleet-in-Rancher dev script

8e9506f

The script would not manage a cluster's lifecycle beyond its creation anyway, and doing so would be harder to automate. It is therefore left out of the script's scope.

Create new workflow for testing Fleet in Rancher

0bc41a4

This leaves the original workflow in place, upgrading Fleet in Rancher.

Remove test workload block

c2a0225

That block is not necessary, as tests are run against clusters afterwards.

Add acceptance tests to Fleet-in-Rancher test workflow

7b82940

This ensures that Fleet examples are validated against Fleet in Rancher.

weyfonk force-pushed the self-hosted-runner-fleet-in-rancher branch from 683ec88 to 7b82940 Compare September 26, 2024 06:46

weyfonk marked this pull request as ready for review September 26, 2024 07:01

Comment out branch check

7c318a1

That check is only relevant when using a custom charts branch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Fleet in Rancher on self-hosted runner #2804

Test Fleet in Rancher on self-hosted runner #2804

weyfonk commented Sep 4, 2024 •

edited

Loading

manno Sep 17, 2024

Test Fleet in Rancher on self-hosted runner #2804

Are you sure you want to change the base?

Test Fleet in Rancher on self-hosted runner #2804

Conversation

weyfonk commented Sep 4, 2024 • edited Loading

Open points

Possible improvements

manno Sep 17, 2024

Choose a reason for hiding this comment

weyfonk commented Sep 4, 2024 •

edited

Loading