-
Notifications
You must be signed in to change notification settings - Fork 47
Reset Nodeset DeploymentReadyCondition #552
Reset Nodeset DeploymentReadyCondition #552
Conversation
When a deployment is in progess and you delete a OpneStackDataplanedeployment resource we should reset the DeploymentReadyCondition back to Deployment not Started.. If after a successful deployment, the OpneStackDataplanedeployment resource is deleted for a new deployment it would reset it back too.
@@ -92,11 +92,11 @@ spec: | |||
status: | |||
conditions: | |||
- message: Deployment not started | |||
reason: NotRequested | |||
reason: Requested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Deployment of the NodeSet hasn't actually been Requested in this test though, so I think in the else
clause you added it should use condition.NotRequestedReason
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always check whether a deployment is there or not and it's status after a nodeset is SetupReady, so the status is requested right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's Requested until an associated Deployment is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I thought I had replied to this. We always
check if there are deployments for the NodeSet and there can be more than one, so I think the NodeSet deployment status
is always Requested
. If we don't agree with it we can change it later.
@@ -319,6 +324,9 @@ func checkDeployment(helper *helper.Helper, | |||
return false, false, err | |||
} | |||
for _, deployment := range deployments.Items { | |||
if !deployment.DeletionTimestamp.IsZero() { | |||
continue | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleting the deployment doesn't mean the NodeSet is no longer deployed when it was previously, and the else
clause would now reset the condition. I'm not sure this is the right logic. Perhaps a new state is needed to indicate a deployment was done, but the Deployment resource has been deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we keep the deployment history (which would be another CR as we can't keep the deployment history in NodeSet Status as Status is not something that can be backed up and restored and should always be evaluated after reconciliation), there is no way to know what happened to the earlier deployments (whether the deployments on a Nodeset were successful or deleted in-between or failed. I can add NodeDeploymentHistory CR which can be managed per NodeSet by nodeset controller, if that works for us, otherwise, this is the best we can do with the current design (i.e update status to Deployment Not Started
if there are no deployments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeSet Status doesn't have to be backed up or restored. We could set a condition, or a field to indicate it was deployed at one time. Whenever the NodeSet is reconciled, it can treat that as sticky, keeping the value as it's already set.
A new CRD doesn't really address this and isn't needed. It would have the same issue anyway wouldn't it if someone were to just delete those CRs? I guess I'm wondering under what circumstances we expect the Deployments to get deleted. Is this to protect against the user accidentally deleting them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeSet Status doesn't have to be backed up or restored
I meant that in the context of storing deployment history in Status
, which some people use, but results in issues with backup and restore.
it can treat that as sticky, keeping the value as it's already set.
That won't work in all scenarios, if a deployment is in progress and you delete the deployment for some reason (ex. abort the deployment), then the status would stay as Deployment In Progress
. Also if there are more than one deployments (with one in progress the status would always be 'Deployment Ready`. Also keeping in mind that the nodesets will go through update, upgrade etc, we need to keep the history somewhere.
A new CRD doesn't really address this and isn't needed
New CRD to keep the state (i.e deployment history
) is an established pattern and is used without an associated controller for these kind of scenarios. We use that to keep IP Reservations[1] in infra-operator. That's not exposed to end users and very little chance of it being deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New CRD to keep the state (i.e
deployment history
) is an established pattern and is used without an associated controller for these kind of scenarios. We use that to keep IP Reservations[1] in infra-operator. That's not exposed to end users and very little chance of it being deleted.
Ok, I get that use case, but I don't really see much difference between Deployment and DeploymentHistory. Either way, they shouldn't be deleted, except in cases where you are perhaps iterating through a deployment and you make a mistake and want to clean things up.
I am ok with this PR as-is I suppose, as long as we document that our practice is that Deployments ought not to be deleted, or you risk losing the latest deployment status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use a history field right? There is already an established design around using history in the status field within OpenShift. For example:
❯ oc get clusterversion version -o yaml | yq .status.history
- completionTime: null
image: registry.ci.openshift.org/origin/release@sha256:4510c50834c07e71521a68c021320b5e44978245c8031a858f74328e54e89e1c
startedTime: "2023-12-08T06:02:39Z"
state: Partial
verified: true
version: 4.14.0-0.okd-2023-12-01-225814
- completionTime: "2023-11-27T04:28:54Z"
image: registry.ci.openshift.org/origin/release@sha256:72d40c51e7c4d1b9c31e9b0d276d045f1b2b93def5ecee49186df856d40bcb5c
startedTime: "2023-11-26T22:50:18Z"
state: Completed
verified: true
version: 4.14.0-0.okd-2023-11-14-101924
- completionTime: "2023-11-06T02:40:33Z"
image: quay.io/openshift/okd@sha256:7a6200e347a1b857e47f2ab0735eb1303af7d796a847d79ef9706f217cd12f5c
startedTime: "2023-11-06T00:52:13Z"
state: Completed
verified: false
version: 4.14.0-0.okd-2023-10-28-073550
I think that would be a nice indicator for customers, plus given that it's an already established design pattern it would be reasonably intuitive.
Do you think that adding a history field to the NodeSet CR would work for the problem we're trying to solve here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that adding a history field to the NodeSet CR would work for the problem we're trying to solve here?
Status field would be lost with backup and restore of the CRs, that's why it's not used to keep state. It is probably fine with clusterversion as that's not expected to be backed-up and restored.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bshephar, rabi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test dataplane-operator-build-deploy-kuttl |
4226649
into
openstack-k8s-operators:main
When a deployment is in progess and you delete a OpneStackDataplanedeployment resource we should reset the DeploymentReadyCondition back to Deployment not Started..
If after a successful deployment, the OpneStackDataplanedeployment resource is deleted for a new deployment it would reset it back too.