Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e: more robust wait for MCP updates #1091

Merged
merged 6 commits into from
Nov 28, 2024

Conversation

ffromani
Copy link
Member

@ffromani ffromani commented Nov 26, 2024

instead of chasing the MCP status transition updated -> updating -> updated, which is fragile and hard to get right both in theory and in practice, let's try a different approach.

When we change a MCP status (e.g. sending a new MachineConfig and/or pausing/unpausing)

  1. we record the current MCP resourceversion
  2. we ask for our desired state
  3. we wait for the MCP desired state to be updated again but with
    different resourceversion

this should allow us to capture the conditions which describe our desired state while waiting for a stable state in a more robust way.

Add minorish cleanups along the way.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 26, 2024
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 26, 2024
The `machineconfig.go` source file held code to
work with kubeletconfig and machineconfigpool objects.
To improve clarity: move the code pertaining to
kuebeltconfig in the relevant file, and rename
the current file to convey it's actually about `machineconfigpool`s
trivial code movement without any intended change.

Signed-off-by: Francesco Romani <[email protected]>
the canonical name for the generic context parameter is `ctx`,
so renaming accordingly. No intended change in behavior.

Signed-off-by: Francesco Romani <[email protected]>
the `mcpInfo` object has a field called `obj` which
always hold a `MachineConfigPool` object. So let's
rename it `mcpObj` for clarity.

No intended changes in behavior.

Signed-off-by: Francesco Romani <[email protected]>
Replace the expectation BeNumerically(">", 0)
with Not(BeEmpty()) for a tiny (arguably very tiny)
extra clarity

Signed-off-by: Francesco Romani <[email protected]>
@ffromani
Copy link
Member Author

Failed to create cluster	{"error": "could not complete platform specific options: failed to create infra: cannot list vpc endpoints: RequestLimitExceeded:

@ffromani
Copy link
Member Author

 Registry server Password: <<non-empty>>
error: build error: Failed to push image: trying to reuse blob sha256:3d252ab42824f0f833f0fcf3c660065bd878cb4de12b9c69b6a1758287e338e8 at destination: unable to retrieve auth token: invalid username/password: authentication required 

@ffromani
Copy link
Member Author

/retest-required

1 similar comment
@ffromani
Copy link
Member Author

/retest-required

@ffromani ffromani changed the title WIP: e2e: more robust wait for MCP updates e2e: more robust wait for MCP updates Nov 27, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 27, 2024
@ffromani
Copy link
Member Author

/cc @shajmakh

@openshift-ci openshift-ci bot requested a review from shajmakh November 27, 2024 08:14
Copy link
Collaborator

@Tal-or Tal-or left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thanks!
small comments inside

test/e2e/uninstall/uninstall_test.go Outdated Show resolved Hide resolved
internal/wait/machineconfigpool.go Outdated Show resolved Hide resolved
test/utils/deploy/openshift.go Show resolved Hide resolved
All updates, including unpausing MCPs may fail, even
more likely on CI, so wrap it with Eventually
for extra robustness.

Signed-off-by: Francesco Romani <[email protected]>
@Tal-or
Copy link
Collaborator

Tal-or commented Nov 27, 2024

[Install] durability with a running cluster with all the components and overall deployment [It] should be able to delete NUMAResourceOperator CR and redeploy without polluting cluster state
/go/src/github.com/openshift-kni/numaresources-operator/test/e2e/install/install_test.go:281
  [FAILED] Unexpected error:
      <*meta.NoKindMatchError | 0xc00076ecc0>: 
      no matches for kind "MachineConfigPool" in version "machineconfiguration.openshift.io/v1"
      {
          GroupKind: {
              Group: "machineconfiguration.openshift.io",
              Kind: "MachineConfigPool",
          },
          SearchedVersions: ["v1"],
      }
  occurred
  In [It] at: /go/src/github.com/openshift-kni/numaresources-operator/test/e2e/install/install_test.go:331 @ 11/27/24 11:07:41.37

Something has changed in the test logic that causes HCP flow to ask for MCP from the API server which leads to the error above

@ffromani
Copy link
Member Author

[Install] durability with a running cluster with all the components and overall deployment [It] should be able to delete NUMAResourceOperator CR and redeploy without polluting cluster state
/go/src/github.com/openshift-kni/numaresources-operator/test/e2e/install/install_test.go:281
  [FAILED] Unexpected error:
      <*meta.NoKindMatchError | 0xc00076ecc0>: 
      no matches for kind "MachineConfigPool" in version "machineconfiguration.openshift.io/v1"
      {
          GroupKind: {
              Group: "machineconfiguration.openshift.io",
              Kind: "MachineConfigPool",
          },
          SearchedVersions: ["v1"],
      }
  occurred
  In [It] at: /go/src/github.com/openshift-kni/numaresources-operator/test/e2e/install/install_test.go:331 @ 11/27/24 11:07:41.37

Something has changed in the test logic that causes HCP flow to ask for MCP from the API server which leads to the error above

thanks, will check

@@ -326,13 +326,14 @@ var _ = Describe("[Install] durability", Serial, func() {
// TODO change to an image which is test dedicated
nroObjRedep.Spec.ExporterImage = e2eimages.RTETestImageCI

// need to get MCPs before the mutation
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the culprit for the hypershift lane failure

instead of following suit the MCP status transition
Updated->Updating->Updated, which is racy and relatively fragile,
let's use a stronger condition:

we care about waiting for MCP to be updated with the desired state.
So:
1. we record the current MCP resourceversion
2. we ask for our desired state
3. we wait for the MCP desired state to be `updated` again *but with
   different resourceversion*

this should allow us to capture the conditions which describe our
desired state while waiting for a stable state in a more robust way.

Signed-off-by: Francesco Romani <[email protected]>
@Tal-or
Copy link
Collaborator

Tal-or commented Nov 28, 2024

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 28, 2024
Copy link
Contributor

openshift-ci bot commented Nov 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, Tal-or

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit eaf0c0f into main Nov 28, 2024
15 checks passed
@ffromani ffromani deleted the e2e-install-wait-mcp-conds branch November 28, 2024 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants