Skip to content
This repository has been archived by the owner on Jun 25, 2024. It is now read-only.

Reset Nodeset DeploymentReadyCondition #552

Merged
merged 1 commit into from
Dec 12, 2023
Merged

Reset Nodeset DeploymentReadyCondition #552

merged 1 commit into from
Dec 12, 2023

Conversation

rabi
Copy link
Contributor

@rabi rabi commented Dec 1, 2023

When a deployment is in progess and you delete a OpneStackDataplanedeployment resource we should reset the DeploymentReadyCondition back to Deployment not Started..

If after a successful deployment, the OpneStackDataplanedeployment resource is deleted for a new deployment it would reset it back too.

@openshift-ci openshift-ci bot requested review from fultonj and rebtoor December 1, 2023 05:21
@openshift-ci openshift-ci bot added the approved label Dec 1, 2023
When a deployment is in progess and you delete a OpneStackDataplanedeployment
resource we should reset the DeploymentReadyCondition back to Deployment
not Started..

If after a successful deployment, the OpneStackDataplanedeployment
resource is deleted for a new deployment it would reset it back too.
@@ -92,11 +92,11 @@ spec:
status:
conditions:
- message: Deployment not started
reason: NotRequested
reason: Requested
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Deployment of the NodeSet hasn't actually been Requested in this test though, so I think in the else clause you added it should use condition.NotRequestedReason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always check whether a deployment is there or not and it's status after a nodeset is SetupReady, so the status is requested right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's Requested until an associated Deployment is created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I thought I had replied to this. We always check if there are deployments for the NodeSet and there can be more than one, so I think the NodeSet deployment status is always Requested. If we don't agree with it we can change it later.

@@ -319,6 +324,9 @@ func checkDeployment(helper *helper.Helper,
return false, false, err
}
for _, deployment := range deployments.Items {
if !deployment.DeletionTimestamp.IsZero() {
continue
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting the deployment doesn't mean the NodeSet is no longer deployed when it was previously, and the else clause would now reset the condition. I'm not sure this is the right logic. Perhaps a new state is needed to indicate a deployment was done, but the Deployment resource has been deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we keep the deployment history (which would be another CR as we can't keep the deployment history in NodeSet Status as Status is not something that can be backed up and restored and should always be evaluated after reconciliation), there is no way to know what happened to the earlier deployments (whether the deployments on a Nodeset were successful or deleted in-between or failed. I can add NodeDeploymentHistory CR which can be managed per NodeSet by nodeset controller, if that works for us, otherwise, this is the best we can do with the current design (i.e update status to Deployment Not Started if there are no deployments).

Copy link
Collaborator

@slagle slagle Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodeSet Status doesn't have to be backed up or restored. We could set a condition, or a field to indicate it was deployed at one time. Whenever the NodeSet is reconciled, it can treat that as sticky, keeping the value as it's already set.

A new CRD doesn't really address this and isn't needed. It would have the same issue anyway wouldn't it if someone were to just delete those CRs? I guess I'm wondering under what circumstances we expect the Deployments to get deleted. Is this to protect against the user accidentally deleting them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NodeSet Status doesn't have to be backed up or restored

I meant that in the context of storing deployment history in Status, which some people use, but results in issues with backup and restore.

it can treat that as sticky, keeping the value as it's already set.

That won't work in all scenarios, if a deployment is in progress and you delete the deployment for some reason (ex. abort the deployment), then the status would stay as Deployment In Progress. Also if there are more than one deployments (with one in progress the status would always be 'Deployment Ready`. Also keeping in mind that the nodesets will go through update, upgrade etc, we need to keep the history somewhere.

A new CRD doesn't really address this and isn't needed

New CRD to keep the state (i.e deployment history) is an established pattern and is used without an associated controller for these kind of scenarios. We use that to keep IP Reservations[1] in infra-operator. That's not exposed to end users and very little chance of it being deleted.

[1] https://github.com/openstack-k8s-operators/infra-operator/blob/main/apis/network/v1beta1/reservation_types.go#L55

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New CRD to keep the state (i.e deployment history) is an established pattern and is used without an associated controller for these kind of scenarios. We use that to keep IP Reservations[1] in infra-operator. That's not exposed to end users and very little chance of it being deleted.

[1] https://github.com/openstack-k8s-operators/infra-operator/blob/main/apis/network/v1beta1/reservation_types.go#L55

Ok, I get that use case, but I don't really see much difference between Deployment and DeploymentHistory. Either way, they shouldn't be deleted, except in cases where you are perhaps iterating through a deployment and you make a mistake and want to clean things up.

I am ok with this PR as-is I suppose, as long as we document that our practice is that Deployments ought not to be deleted, or you risk losing the latest deployment status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use a history field right? There is already an established design around using history in the status field within OpenShift. For example:

❯ oc get clusterversion version -o yaml | yq .status.history
- completionTime: null
  image: registry.ci.openshift.org/origin/release@sha256:4510c50834c07e71521a68c021320b5e44978245c8031a858f74328e54e89e1c
  startedTime: "2023-12-08T06:02:39Z"
  state: Partial
  verified: true
  version: 4.14.0-0.okd-2023-12-01-225814
- completionTime: "2023-11-27T04:28:54Z"
  image: registry.ci.openshift.org/origin/release@sha256:72d40c51e7c4d1b9c31e9b0d276d045f1b2b93def5ecee49186df856d40bcb5c
  startedTime: "2023-11-26T22:50:18Z"
  state: Completed
  verified: true
  version: 4.14.0-0.okd-2023-11-14-101924
- completionTime: "2023-11-06T02:40:33Z"
  image: quay.io/openshift/okd@sha256:7a6200e347a1b857e47f2ab0735eb1303af7d796a847d79ef9706f217cd12f5c
  startedTime: "2023-11-06T00:52:13Z"
  state: Completed
  verified: false
  version: 4.14.0-0.okd-2023-10-28-073550

I think that would be a nice indicator for customers, plus given that it's an already established design pattern it would be reasonably intuitive.

Do you think that adding a history field to the NodeSet CR would work for the problem we're trying to solve here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that adding a history field to the NodeSet CR would work for the problem we're trying to solve here?

Status field would be lost with backup and restore of the CRs, that's why it's not used to keep state. It is probably fine with clusterversion as that's not expected to be backed-up and restored.

Copy link
Contributor

openshift-ci bot commented Dec 12, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bshephar, rabi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rabi
Copy link
Contributor Author

rabi commented Dec 12, 2023

/test dataplane-operator-build-deploy-kuttl

@openshift-merge-bot openshift-merge-bot bot merged commit 4226649 into openstack-k8s-operators:main Dec 12, 2023
2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants