Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BundleDeployment status error message improvement #2679

Merged
merged 11 commits into from
Sep 24, 2024

Conversation

p-se
Copy link
Contributor

@p-se p-se commented Jul 25, 2024

When a resource of a BundleDeployment is removed after the Bundle has been created, the BundleDeployment says it is missing. This is also the case if the resource is not removed but changes its owner. In that case, the message of a resource that is missing might be misleading. These changes adapt the message to say " is not owned by us" instead of saying that the "<resource> is missing".

Part of #2134

@p-se p-se requested a review from a team as a code owner July 25, 2024 08:02
@p-se p-se force-pushed the 2134-missing-when-take-ownership branch from 6d386ed to 51d5ebf Compare July 25, 2024 13:35
@@ -407,6 +407,7 @@ type ModifiedStatus struct {
// +nullable
Name string `json:"name,omitempty"`
Create bool `json:"missing,omitempty"`
Exist bool `json:"exist,omitempty"`
Copy link
Member

@manno manno Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Exist bool `json:"exist,omitempty"`
// Exist is true if the modified resource exists in the cluster. This can happen, when ...
Exist bool `json:"exist,omitempty"`

So this happens because we're missing annotations and the resource ends up in plan.Create instead of plan.Update?

This is probably the last usage of wrangler apply in the agent, we pass the helm resources.Objects into wrangler's DryRun and get a plan with several lists returned (Create, Update, Delete, Objects).

plan, err := m.applied.DryRun(ns, applied.GetSetID(bd.Name, m.labelPrefix, m.labelSuffix), resources.Objects...)

Our Diff only takes resources with updates to existing fields into account, it only looks at plan.Update.

https://github.com/rancher/fleet/blob/main/internal/cmd/agent/deployer/monitor/updatestatus.go#L161-L170

If we wrote our own applied.DryRun, could it return lists that fit our use case better? So that we don't have to do a an additional Get for every resource in plan.Create? I'm asking, because we do want to remove wrangler where possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this happens because we're missing annotations and the resource ends up in plan.Create instead of plan.Update?

It might not be accurate to say, that we're missing annotations. So far I was only able to reproduce the issue when two bundles have fought for ownership. When one has it and the other takes it, the first says "missing" instead of "isn't owned by me". This message is adapted in this PR.

If we wrote our own applied.DryRun, could it return lists that fit our use case better? So that we don't have to do a an additional Get for every resource in plan.Create? I'm asking, because we do want to remove wrangler where possible.

Yes, I think we can do better when we write our own function to diff. However, I do not claim to have fully understood the diffing process.

And I only was able to reproduce the issue with two competing bundles, not with resources installed without a bundle and a bundle failing to properly adopt them. The newly create SURE might help to further investigate that.

Copy link
Member

@manno manno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds 10 get requests per deployment if I'm right. Is that a performance problem?

@p-se
Copy link
Contributor Author

p-se commented Aug 29, 2024

This adds 10 get requests per deployment if I'm right. Is that a performance problem?

This adds up to 10 requests per bundledeployment. One request for every resource that the dry-run reported to be missing (that it would create). There is no default requeue period set in the reconciler, so that those requests are only done once on an attempted deployment of the downstream cluster (or if the bundledeployment is changed). It does not look like a performance problem to me.

@p-se p-se force-pushed the 2134-missing-when-take-ownership branch from 9fd2fbd to 25f73d0 Compare August 29, 2024 14:41
@p-se p-se force-pushed the 2134-missing-when-take-ownership branch from 25f73d0 to 53561b9 Compare September 12, 2024 06:55
@p-se p-se changed the title BundleDeployment error message improvement BundleDeployment status error message improvement Sep 12, 2024
@p-se p-se force-pushed the 2134-missing-when-take-ownership branch from 53561b9 to 7e4e9ed Compare September 12, 2024 07:24
@p-se p-se requested a review from manno September 12, 2024 08:55
@p-se p-se added the kind/bug label Sep 12, 2024
@p-se p-se requested a review from a team September 12, 2024 08:55
@p-se p-se force-pushed the 2134-missing-when-take-ownership branch from 7e4e9ed to 50b20be Compare September 13, 2024 13:05
internal/cmd/agent/deployer/monitor/updatestatus.go Outdated Show resolved Hide resolved
internal/cmd/agent/deployer/monitor/updatestatus.go Outdated Show resolved Hide resolved
integrationtests/agent/adoption_test.go Outdated Show resolved Hide resolved
integrationtests/agent/adoption_test.go Outdated Show resolved Hide resolved
integrationtests/agent/adoption_test.go Outdated Show resolved Hide resolved
integrationtests/agent/adoption_test.go Show resolved Hide resolved
integrationtests/agent/adoption_test.go Outdated Show resolved Hide resolved
integrationtests/agent/adoption_test.go Outdated Show resolved Hide resolved
integrationtests/agent/adoption_test.go Outdated Show resolved Hide resolved
integrationtests/agent/adoption_test.go Outdated Show resolved Hide resolved
integrationtests/agent/adoption_test.go Outdated Show resolved Hide resolved
integrationtests/agent/adoption_test.go Outdated Show resolved Hide resolved
integrationtests/agent/adoption_test.go Show resolved Hide resolved
integrationtests/agent/adoption_test.go Show resolved Hide resolved
weyfonk
weyfonk previously approved these changes Sep 17, 2024
Copy link
Contributor

@weyfonk weyfonk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

When a resource of a BundleDeployment is removed after the Bundle has
been created, the BundleDeployment says it is missing. This is also the
case if the resource is not removed but changes its owner. In that case,
the message of a resource that is missing might be misleading. These
changes adapt the messsage to say "<resource> is not owned by us"
instead of saying that the "<resource> is missing".

Part of rancher#2134
as that is ineffective anyway. It returns true but does not remove them.
@p-se p-se enabled auto-merge (squash) September 24, 2024 08:46
@p-se p-se merged commit 54f46c8 into rancher:main Sep 24, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants