-
Notifications
You must be signed in to change notification settings - Fork 49
OCPBUGS-52448: Remove gathering of failure domains from machine sets #356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-52448: Remove gathering of failure domains from machine sets #356
Conversation
|
@RadekManak: This pull request references Jira Issue OCPBUGS-52448, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@RadekManak: This pull request references Jira Issue OCPBUGS-52448, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@RadekManak: This pull request references Jira Issue OCPBUGS-52448, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
JoelSpeed
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes make sense, but we will need to get the tests updated
|
/label qe-approved |
7d567cd to
1269978
Compare
|
|
||
| It("should keep the status unchanged consistently", func() { | ||
| Consistently(komega.Object(cpms)).Should(HaveField("Status", SatisfyAll( | ||
| Consistently(komega.Object(cpms), 1*time.Second).Should(HaveField("Status", SatisfyAll( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is our default consistently timeout? Does the suite set this somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was no default set. The gomega default was 100ms with polling interval of 10ms.
I have retested this change and changed the default to 500ms with 50ms pooling interval in the unit controlplanemachineset, controlplanemachinesetgenerator and machine provider test suite.
After this change, I found the Noneplatform test was also broken.
| // Create Machines with some wait time between them | ||
| // to achieve staggered CreationTimestamp(s). | ||
| Expect(k8sClient.Create(ctx, machine0)).To(Succeed()) | ||
| time.Sleep(1 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this not work before? I see the comment above suggests there was already wait time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The machines had the same creationTimestamp. The test still pased becuase of the short duration of Consistently interval and because we sort machines with the same timestamp by name.
1269978 to
08e62c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set default consistently timeout and fixed platformNone tests that the change revealed to be broken.
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JoelSpeed The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
2 similar comments
|
@RadekManak: This pull request references Jira Issue OCPBUGS-52448, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
2 similar comments
|
/hold |
8e5e868 to
bfdb745
Compare
|
/hold cancel |
bfdb745 to
f27b517
Compare
|
/lgtm |
|
/retest-required |
|
ci/prow/e2e-aws-ovn-etcd-scaling (and infact all the etcd-scaling jobs) are known to be broken on available condition. |
1 similar comment
|
@RadekManak: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/override ci/prow/e2e-aws-ovn-etcd-scaling This is a known issue and not related to this PR (we have a potential fix for it on #357) |
|
@damdo: Overrode contexts on behalf of damdo: ci/prow/e2e-aws-ovn-etcd-scaling In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
0bbafe2
into
openshift:main
|
@RadekManak: Jira Issue OCPBUGS-52448: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-52448 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[ART PR BUILD NOTIFIER] Distgit: ose-cluster-control-plane-machine-set-operator |
|
/cherry-pick release-4.19 |
|
@RadekManak: #356 failed to apply on top of branch "release-4.19": In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This fixes a bug when a cluster is running with 3 control plane nodes in a single AZ, and machine pools in > 1 AZ, CPMS does not generate a config.
We decided to remove the feature that gathers additional failure domains from MachineSets. While useful, this feature prevents the generation of the CPMS in the case mentioned above. Our priority is to generate a valid CPMS based on the current state of the control plane, allowing the cluster administrator to add failure domains later if needed, rather than requiring manual intervention upfront.