-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AddFinalizer clears Status subobject leading to delayed object readiness #791
Comments
I see 2 ways of fixing this:
The benefit of 1 is that it applies to everyone, no matter the finalizer implementation they used while 2 only fixes that specific implementation. We should probably also see if there are other places where this could happen. |
I remember that the quoted behavior requires providers to reconcile multiple times in order to apply a set of changes. The comment in the late initialization logic acknowledges the behavior: crossplane-runtime/pkg/reconciler/managed/reconciler.go Lines 1276 to 1290 in 19d95a6
|
You're right, that comment explains exactly the behaviour I'm seeing for the update call. However, it says that the update will implicitly requeue an immediate reconcile, but that is not what I'm seeing here, at least when the object is in Observe-only mode and when the update is a call to add the finalizer. I think that for efficiency it would be better to handle these cases in a single reconcile to reduce load on providers pods, provider's external API and kubernetes API. On thousands of objects, this can make a significant impact. Also, assuming the requeue was actually done immediately, it would still potentially be behind hundreds of other objects in the queue. Again, for the observe-only use-case where a single reconcile would be enough, I feel like we can optimize it. Are you against saving the Status sub-struct and restoring it after the update call? If so, what alternative do you suggest? |
I would imagine that the comment assumes the typical scenario of full control by Crossplane which includes a subsequent Create call which will trigger the update, and in the Observe case there is no subsequent operation to trigger the status update. |
I should have clarified that I don't have any opinions on the matter — I don't know enough about the managed reconciler design. I just wanted to provide reference for the described behavior. |
I understand. Either way, this is a real bug and I want to propose a solution. Is there any maintainer that could help push this forward by suggesting an alternative or approving my suggestion so I can implement and propose a PR? |
Thanks for the thorough investigation and detailed report @gravufo, I see the problem and agree with the pain it creates. It is interesting adding a finalizer doesn't cause object to be re-queued. So, if I watch a resource with "kubectl get foo" and in another terminal I add a finalizer to it, no new watch event would be thrown? 🤔
This seems like a reasonable solution, and I can't think of a better way to handle this. So, feel free to come up with a PR. |
According to kedacore/keda#437 (comment) it would seem that adding finalizers should not update the generation, thus not triggering a reconcile. It makes sense. |
@turkenh It's not quite easy to modify the status sub-object since there are no helper methods for that and it feels awkward having to use The other option I can see is to move the AddFinalizer block before the Observe call. I'm not sure why it's so low in the reconcile loop, I feel like it should basically be the first thing being done. Unless we are trying to ensure we can handle the object before setting it? |
Yeah, I agree with
I believe it is a good practice to set the finalizer just before you are about to make a change that you would need to clean up during resource deletion, e.g. create or upstate external resource. As a practical example, there is no reason to have a finalizer if the initial observe fails since there is nothing to cleanup yet. |
What happened?
We create managed resources using only the
Observe
management policy and the provider will initially set theSynced
condition toTrue
but will not set theReady
condition at all.We can see in the provider logs that the object seems to be successfully reconciled and is requeued according to the configured
sync interval
.Once the sync interval is hit, the resource becomes ready.
How can we reproduce it?
Observe
management policy. We can easily reproduce this withprovider-upjet-azure
in any resource, for instanceVirtualNetwork
with the following spec:Synced
almost immediately, but theReady
condition will staynil
. Also, the wholeatProvider
struct staysnil
. Provider logs will show the following:Ready
condition will be set totrue
.What environment did it happen in?
Crossplane version: 1.18.0
Additional information
After step-by-step debugging, we have found that the bug seems to come from the
AddFinalizer
function in theAPIFinalizer
struct. This struct is used byprovider-upjet-azure
and probably allupjet
based providers (although I didn't validate).The
AddFinalizer
function calls the kube client'sUpdate
function which will apply an object in the kube API, but also return the updated object returned by the API to the caller. This, in turn, nullifies any struct that is not applied by theUpdate
function such as theStatus
sub-struct.Here are the relevant parts of the code:
The text was updated successfully, but these errors were encountered: