-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: handle resource snapshot missing but work already synced and add cro/ro annotation to all the works #936
fix: handle resource snapshot missing but work already synced and add cro/ro annotation to all the works #936
Conversation
78db570
to
251b3a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a few comments. LGTM ;)
@@ -485,6 +503,32 @@ func (r *Reconciler) syncAllWork(ctx context.Context, resourceBinding *fleetv1be | |||
return true, updateAny.Load(), nil | |||
} | |||
|
|||
// areAllWorkSynced checks if all the works are synced with the resource binding. | |||
func areAllWorkSynced(existingWorks map[string]*fleetv1beta1.Work, resourceBinding *fleetv1beta1.ClusterResourceBinding, _, _ string) bool { | |||
syncedCondition := resourceBinding.GetCondition(string(fleetv1beta1.ResourceBindingWorkSynchronized)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ryan! For clusters that have been in the faulted state this condition might have been removed; of course we have instructed the impacted user to fix things manually and they are the only affected party we knew, so it might be alright
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ;)
c7d247c
to
a8225a9
Compare
a8225a9
to
f079f32
Compare
Eventually(crpStatusActual, eventuallyDuration, eventuallyInterval).Should(Succeed(), "Failed to update CRP status as expected") | ||
}) | ||
|
||
It("update work to trigger a work generator reconcile", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ryan! Is this step necessary? It's clearing the manifestConditions on the work objects, which will set the failedPlacements part in the resource binding status to nil, but this does not seem to be related to the test cause. Sorry if I missed anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And normally the work object should have already been marked as unavailable after image change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the work doesn't change, the already available binding won't get reconcile then its status will stay as "ready" then it won't hit this bug.
if err != nil { | ||
return err | ||
} | ||
testDeployment.Spec.Template.Spec.Containers[0].Image = "1.26.2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, Ryan, I think this would be still be an invalid image name? But it does align with the situation Infosys encountered (which is, even if the old snapshot has been removed and all clusters are failing, updates should still be processed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this image is valid
binding.Spec.ResourceSnapshotName = "next" | ||
Expect(k8sClient.Update(ctx, binding)).Should(Succeed()) | ||
updateRolloutStartedGeneration(&binding) | ||
// check the binding status that it should be marked as override succeed but not synced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ryan! Just a nit: here it means rollout started but not overridden, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ;)
Description of your changes
Fixes #
I have:
make reviewable
to ensure this PR is ready for review.How has this code been tested
Special notes for your reviewer