e2e: more robust wait for MCP updates #1091

ffromani · 2024-11-26T09:32:27Z

instead of chasing the MCP status transition updated -> updating -> updated, which is fragile and hard to get right both in theory and in practice, let's try a different approach.

When we change a MCP status (e.g. sending a new MachineConfig and/or pausing/unpausing)

we record the current MCP resourceversion
we ask for our desired state
we wait for the MCP desired state to be updated again but with
different resourceversion

this should allow us to capture the conditions which describe our desired state while waiting for a stable state in a more robust way.

Add minorish cleanups along the way.

The `machineconfig.go` source file held code to work with kubeletconfig and machineconfigpool objects. To improve clarity: move the code pertaining to kuebeltconfig in the relevant file, and rename the current file to convey it's actually about `machineconfigpool`s trivial code movement without any intended change. Signed-off-by: Francesco Romani <[email protected]>

the canonical name for the generic context parameter is `ctx`, so renaming accordingly. No intended change in behavior. Signed-off-by: Francesco Romani <[email protected]>

the `mcpInfo` object has a field called `obj` which always hold a `MachineConfigPool` object. So let's rename it `mcpObj` for clarity. No intended changes in behavior. Signed-off-by: Francesco Romani <[email protected]>

Replace the expectation BeNumerically(">", 0) with Not(BeEmpty()) for a tiny (arguably very tiny) extra clarity Signed-off-by: Francesco Romani <[email protected]>

ffromani · 2024-11-26T11:10:12Z

Failed to create cluster	{"error": "could not complete platform specific options: failed to create infra: cannot list vpc endpoints: RequestLimitExceeded:

ffromani · 2024-11-26T14:33:32Z

 Registry server Password: <<non-empty>>
error: build error: Failed to push image: trying to reuse blob sha256:3d252ab42824f0f833f0fcf3c660065bd878cb4de12b9c69b6a1758287e338e8 at destination: unable to retrieve auth token: invalid username/password: authentication required

ffromani · 2024-11-26T14:33:38Z

/retest-required

ffromani · 2024-11-27T08:10:25Z

/retest-required

ffromani · 2024-11-27T08:14:51Z

/cc @shajmakh

Tal-or

Looking good, thanks!
small comments inside

test/e2e/uninstall/uninstall_test.go

internal/wait/machineconfigpool.go

test/utils/deploy/openshift.go

All updates, including unpausing MCPs may fail, even more likely on CI, so wrap it with Eventually for extra robustness. Signed-off-by: Francesco Romani <[email protected]>

Tal-or · 2024-11-27T13:19:54Z

[Install] durability with a running cluster with all the components and overall deployment [It] should be able to delete NUMAResourceOperator CR and redeploy without polluting cluster state
/go/src/github.com/openshift-kni/numaresources-operator/test/e2e/install/install_test.go:281
  [FAILED] Unexpected error:
      <*meta.NoKindMatchError | 0xc00076ecc0>: 
      no matches for kind "MachineConfigPool" in version "machineconfiguration.openshift.io/v1"
      {
          GroupKind: {
              Group: "machineconfiguration.openshift.io",
              Kind: "MachineConfigPool",
          },
          SearchedVersions: ["v1"],
      }
  occurred
  In [It] at: /go/src/github.com/openshift-kni/numaresources-operator/test/e2e/install/install_test.go:331 @ 11/27/24 11:07:41.37

Something has changed in the test logic that causes HCP flow to ask for MCP from the API server which leads to the error above

ffromani · 2024-11-27T13:23:13Z

[Install] durability with a running cluster with all the components and overall deployment [It] should be able to delete NUMAResourceOperator CR and redeploy without polluting cluster state
/go/src/github.com/openshift-kni/numaresources-operator/test/e2e/install/install_test.go:281
  [FAILED] Unexpected error:
      <*meta.NoKindMatchError | 0xc00076ecc0>: 
      no matches for kind "MachineConfigPool" in version "machineconfiguration.openshift.io/v1"
      {
          GroupKind: {
              Group: "machineconfiguration.openshift.io",
              Kind: "MachineConfigPool",
          },
          SearchedVersions: ["v1"],
      }
  occurred
  In [It] at: /go/src/github.com/openshift-kni/numaresources-operator/test/e2e/install/install_test.go:331 @ 11/27/24 11:07:41.37

Something has changed in the test logic that causes HCP flow to ask for MCP from the API server which leads to the error above

thanks, will check

ffromani · 2024-11-27T13:24:13Z

test/e2e/install/install_test.go

@@ -326,13 +326,14 @@ var _ = Describe("[Install] durability", Serial, func() {
 			// TODO change to an image which is test dedicated
 			nroObjRedep.Spec.ExporterImage = e2eimages.RTETestImageCI

+			// need to get MCPs before the mutation


this is the culprit for the hypershift lane failure

instead of following suit the MCP status transition Updated->Updating->Updated, which is racy and relatively fragile, let's use a stronger condition: we care about waiting for MCP to be updated with the desired state. So: 1. we record the current MCP resourceversion 2. we ask for our desired state 3. we wait for the MCP desired state to be `updated` again *but with different resourceversion* this should allow us to capture the conditions which describe our desired state while waiting for a stable state in a more robust way. Signed-off-by: Francesco Romani <[email protected]>

Tal-or · 2024-11-28T10:17:37Z

/lgtm
/approve

openshift-ci · 2024-11-28T10:18:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, Tal-or

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Tal-or,ffromani]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 26, 2024

openshift-ci bot requested review from mrniranjan and Tal-or November 26, 2024 09:32

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 26, 2024

ffromani added 4 commits November 26, 2024 11:55

chore: rename ctx parameter

3417ca8

the canonical name for the generic context parameter is `ctx`, so renaming accordingly. No intended change in behavior. Signed-off-by: Francesco Romani <[email protected]>

chore: e2e: serial: field rename

fd10244

the `mcpInfo` object has a field called `obj` which always hold a `MachineConfigPool` object. So let's rename it `mcpObj` for clarity. No intended changes in behavior. Signed-off-by: Francesco Romani <[email protected]>

chore: e2e: serial: use beempty expectations

8fb79eb

Replace the expectation BeNumerically(">", 0) with Not(BeEmpty()) for a tiny (arguably very tiny) extra clarity Signed-off-by: Francesco Romani <[email protected]>

ffromani force-pushed the e2e-install-wait-mcp-conds branch from 4e9b9c6 to 878df91 Compare November 26, 2024 12:42

ffromani changed the title ~~WIP: e2e: more robust wait for MCP updates~~ e2e: more robust wait for MCP updates Nov 27, 2024

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 27, 2024

openshift-ci bot requested a review from shajmakh November 27, 2024 08:14

Tal-or reviewed Nov 27, 2024

View reviewed changes

test/e2e/uninstall/uninstall_test.go Outdated Show resolved Hide resolved

internal/wait/machineconfigpool.go Outdated Show resolved Hide resolved

test/utils/deploy/openshift.go Show resolved Hide resolved

e2e: serial: wrap unpause in Eventually

ff77027

All updates, including unpausing MCPs may fail, even more likely on CI, so wrap it with Eventually for extra robustness. Signed-off-by: Francesco Romani <[email protected]>

ffromani force-pushed the e2e-install-wait-mcp-conds branch from 878df91 to 0c6d4b5 Compare November 27, 2024 09:31

ffromani commented Nov 27, 2024

View reviewed changes

ffromani force-pushed the e2e-install-wait-mcp-conds branch from 0c6d4b5 to 82a797d Compare November 27, 2024 13:27

openshift-ci bot assigned Tal-or Nov 28, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 28, 2024

openshift-merge-bot bot merged commit eaf0c0f into main Nov 28, 2024
15 checks passed

ffromani deleted the e2e-install-wait-mcp-conds branch November 28, 2024 10:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

e2e: more robust wait for MCP updates #1091

e2e: more robust wait for MCP updates #1091

ffromani commented Nov 26, 2024 •

edited

Loading

ffromani commented Nov 26, 2024

ffromani commented Nov 26, 2024

ffromani commented Nov 26, 2024

ffromani commented Nov 27, 2024

ffromani commented Nov 27, 2024

Tal-or left a comment

Tal-or commented Nov 27, 2024

ffromani commented Nov 27, 2024

ffromani Nov 27, 2024

Tal-or commented Nov 28, 2024

openshift-ci bot commented Nov 28, 2024

e2e: more robust wait for MCP updates #1091

e2e: more robust wait for MCP updates #1091

Conversation

ffromani commented Nov 26, 2024 • edited Loading

ffromani commented Nov 26, 2024

ffromani commented Nov 26, 2024

ffromani commented Nov 26, 2024

ffromani commented Nov 27, 2024

ffromani commented Nov 27, 2024

Tal-or left a comment

Choose a reason for hiding this comment

Tal-or commented Nov 27, 2024

ffromani commented Nov 27, 2024

ffromani Nov 27, 2024

Choose a reason for hiding this comment

Tal-or commented Nov 28, 2024

openshift-ci bot commented Nov 28, 2024

ffromani commented Nov 26, 2024 •

edited

Loading