-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPCLOUD-2775: add cluster api autoscaler integration enhancement #1736
base: master
Are you sure you want to change the base?
Conversation
@elmiko: This pull request references OCPCLOUD-2775 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.19.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1c80b56
to
4554fb1
Compare
i'm not sure why it's barfing on the metadata |
figured it out, needed quoting on the github handles |
4554fb1
to
73bfea9
Compare
@elmiko: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
version and would allow us to drop some patches we are carrying. The Cluster | ||
API MachineSet sync controller will be updated to recognize when the | ||
Cluster Autoscaler has made a change to a Cluster API resource and then sync | ||
the change to the corresponding Machine API resource, regardless of which resource | ||
is authoritative. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to clarify exactly what kind of writes the CAS would be making, am I right in thinking that it's just the scale subresource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, only the scale subresource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we could possibly gate scale subresource updates in a different way to other writes 🤔
locate the resource. The Cluster API MachineSet sync controller will be updated | ||
to ensure that when the Cluster Autoscaler Operator adds the autoscaling | ||
annotations that they are copied to any related resources, regardless of which | ||
is authoritative. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case, since we own the CAO, we don't necessarily need an exception within the CAPI sync controller, and could handle this in CAO. I would expect CAO to look at a MAPI MachineSet, and check if it's authoritative, and then apply the annotations correctly
Will it still be annotations on the CAPI side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the annotations are available on the CAPI side, we will need to migrate a few of them. eventually we will want the infrastructure templates to carry the capacity info in their status field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an estimated timeline on having the scale information directly in the status?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no timeline has ever been proposed. this feature was added as "opt-in", providers are not required to make these changes.
The Cluster Autoscaler Operator will update the Machine API MachineSet resource, and | ||
then the MachineSet sync controller will sync the change to the Cluster API MachineSet | ||
resource. The sync controller will use the managed fields (i.e. `.metadata.managedFields`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it an error for there to be a MachineAutoscaler that points to both MAPI and CAPI versions of the same MachineSet?
And then CAO can just update the correct, authoritative resource with whatever information it needs to, without having to worry about conflicting resources
I think if we do this, we need less special logic within the sync controllers
In theory we could validate this using a validating admission policy, but I don't think it's fool-proof
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it an error for there to be a MachineAutoscaler that points to both MAPI and CAPI versions of the same MachineSet?
yes we can. the code currently checks to see if a MachineSet has an annotation indicating that it has a MachineAutoscaler, we could extend this search to cover the case where 2 MachineAutoscalers are effectively pointing at the same target.
And then CAO can just update the correct, authoritative resource with whatever information it needs to, without having to worry about conflicting resources
i want to make sure i understand this. as a user, i have a MAPI authoritative MachineSet but i create a MachineAutoscaler referencing the non-authoritative CAPI MachineSet. when the CAO sees this, it should locate both MAPI and CAPI MachineSets and then only update the authoritative one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i want to make sure i understand this. as a user, i have a MAPI authoritative MachineSet but i create a MachineAutoscaler referencing the non-authoritative CAPI MachineSet. when the CAO sees this, it should locate both MAPI and CAPI MachineSets and then only update the authoritative one?
That is what I'm suggesting yes, do you think that is overly complex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, i don't think it's overly complex, and i like how it gives the user strong feedback. but, we still need to reconcile this behavior with your comment here.
essentially, what should happen if the authoritative resource changes, should the MachineAutoscaler reconcile again and change its status?
* The Cluster Autoscaler Operator has added the minimum and maximum size annotations, and ownership | ||
annotation to a record. If the sync controller sees an update to these annotations on a | ||
non-authoritative resource originating from the Cluster Autoscaler Operator, it will copy | ||
that change to the authoritative resource if no MachineAutoscaler is referencing the | ||
authoritative resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we own the CAO, I think implementing this within CAO is likely the easier option.
The sync logic already syncs annotations across between machinesets, we just need to add special logic to handle well known annotations if they somehow end up converting from annotations to structured status
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, i can update this
* A provider MachineSet controller has added the scale from zero annotations to a | ||
non-authoritative record. This occurs when the Cluster API resource is marked as | ||
authoritative but the Machine API resource is updated by the provider MachineSet controller. | ||
In these cases the scale from zero annotations will be copied to the non-authoritative | ||
Cluster API resource. The data from the MachineSet controller is only applied to | ||
Machine API resources currently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an equivalent of this controller in CAPI? Or, if not, is it on the roadmap? If it is on the roadmap, we will want to ensure these controllers following the same pausing as the rest of the controllers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the last time i looked, most if not all of the providers also package a MachineSet actuators, but we actually want to promote a different behavior in the upstream. we want upstream providers to implement infrastructure template controllers to add the capacity information to the status on the infrastructure template, not as annotations on the MachineSet or MachineDeployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a timeline for seeing something like this directly in the upstream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as previous reply, a timelime was never proposed for this as it is "opt-in".
* `.spec.replicas` changes will be synced from the Cluster API MachineSet to the Machine | ||
API MachineSet regardless of which is authoritative when the change originates from the | ||
Cluster Autoscaler. As the Cluster Autoscaler will be configured to operate against Cluster | ||
API resources only, there will be a need to identify when the Cluster Autoscaler has updated | ||
a non-authoritative Cluster API resource so that the authoritative resource can be updated. | ||
This will only occur when the sync controller observes and update to the replicas field from | ||
the Cluster Autoscaler. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We intend to use a VAP to block updates to a non-authoritative resource, have you investigated how we let the CAS through the VAP (what exception do we need in place) to make just this replicas field change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have not, do you happen to have a link where i could learn more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our intention is to use VAP (https://kubernetes.io/docs/reference/access-authn-authz/validating-admission-policy/) to block writes to the object unless they come specifically from sync controller, or, the resource itself is authoritative.
I think what would be interesting to explore would be if we could add an exception allowing another service account to write, but specifically only to the status subresource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that is interesting.
could create a race condition where updating the minimum and maximum size | ||
values will lead to an inaccurate update to both MachineSets. | ||
|
||
To address the risk of possible race conditions on MachineAutoscalers we have a few |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Third option, CAS will take no action when multiple MachineAutoscalers refer to the same MachineSet (MAPI and CAPI mirrors) and instead will update the status to show that one of the two should be removed
Fourth option, use some sort of admission time validation to try and prevent this
Three and four should be able to be done in conjunction with one another
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Third option, CAS will take no action when multiple MachineAutoscalers refer to the same MachineSet (MAPI and CAPI mirrors) and instead will update the status to show that one of the two should be removed
this is similar to option 1 in some ways, also the CAS does not take action from MachineAutoscalers, only when it sees the min/max annotations on a MachineSet (CAPI in this case).
and yes, i want the status on a MachineAutoscaler to tell the user if they have the wrong record or whatever.
Fourth option, use some sort of admission time validation to try and prevent this
will add this as an option.
i think the best solution is probably for us to only allow selecting the authoritative MachineSet, and then putting a good error message in the status. it might be confusing at first, but will make the logic easy to implement and give users a clear indication of how to fix the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the best solution is probably for us to only allow selecting the authoritative MachineSet
The authoritative machineset may change over time, so we could accept something, and later it become invalid based on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like something we will need to address either way, it could happen to any MachineAutoscaler resource.
2. Will we want to remove MachineAutoscalers that reference Cluster API MachineSets | ||
during a downgrade? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't support downgrades, so we probably don't need to be concerned about this.
We won't automatically create these during an upgrade, so downgrading a failed upgrade shouldn't produce an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good, i'll add an answer to that question.
this enhancement describes how we will integrate the cluster autoscaler, and related controllers, with the Cluster API machine management layer.