Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some resources remain even after downstream cluster was deleted #1674

Closed
kkaempf opened this issue Jul 25, 2023 · 4 comments
Closed

Some resources remain even after downstream cluster was deleted #1674

kkaempf opened this issue Jul 25, 2023 · 4 comments
Labels
Milestone

Comments

@kkaempf
Copy link
Collaborator

kkaempf commented Jul 25, 2023

SURE-6578

Issue description:

Some resources remain even after downstream cluster was deleted.
Is this a bug?
As a workaround, could customer remove these resources?

Business impact:

100,000 of resources remain and etcd servers are under pressure.

Troubleshooting steps:
none.

Repro steps:

I could reproduce it in local aws environment.

 
[Rancher server]
OS: Ubuntu 22.04
Docker: 20.10.23
RKE: 1.4.6
Kubernetes: rancher/hyperkube:v1.26.4-rancher2
Rancher: 2.7.5
Fleet: 0.7.0

[Downstream cluster]
OS: Ubuntu 22.04
Docker: 20.10.23
Kubernetes: v1.26.4-rancher2

[Create custom Downstream cluster]
Create -> RKE1 -> Custom

[Delete the custom Downstream cluster]
From UI.

[Trailing resources remains]
 

$ kubectl -n fleet-default get clusterregistration.fleet.cattle.io
NAME            CLUSTER-NAME   LABELS
request-rj6t5   c-c88j8        {"management.cattle.io/cluster-display-name":"ds-rke1","management.cattle.io/cluster-name":"c-c88j8","objectset.rio.cattle.io/hash":"7e3568c9948eb7abcae51039a25f718ab2b8e53f","provider.cattle.io":"rke"} 

$ kubectl -n fleet-default get rolebinding.rbac.authorization.k8s.io
NAME            ROLE                 AGE
request-rj6t5   Role/request-rj6t5   5m8s 

$ kubectl -n fleet-default get role.rbac.authorization.k8s.io
NAME            CREATED AT
request-rj6t5   2023-07-03T01:44:37Z

 

Workaround:

Is workararound available and implemented? no

Actual behavior:

Some resouces remain after cluster deletion.

Expected behavior:

No resource remains after cluster deletion.

@manno
Copy link
Member

manno commented Aug 21, 2023

The fleet controller would clean them up when it sees the cluster resource being deleted. However, if the fleet controller misses the event, the resources will remain.

This should be fixed by the new hook implemented in #1690
Every time fleet is upgraded it will remove orphaned cluster registrations. This can also be done manually with the fleet cli, by running fleet cleanup --min 1ms --max 1ms

@kkaempf
Copy link
Collaborator Author

kkaempf commented Aug 21, 2023

This should be QA validated

@manno
Copy link
Member

manno commented Aug 24, 2023

Extensive testing has been done in #1690, however it looks like we didn't explicitly test the cleanup for old clusters, because Rancher recreated them.

Testing

We know the hook script runs. So we just need to verify the clean up part.

  • Add a cluster to Rancher
  • Stop the fleet-controller, e.g. by setting the replica count of the deployment to 0
  • Remove the cluster, provisioning, management and fleet cluster resources are gone
  • Run the clean up command via the fleet CLI
  • Observe that the clusterregistration and service account are removed

@sbulage
Copy link
Contributor

sbulage commented Oct 6, 2023

I tried above steps on 3-4 cluster (imported) and I observed that there are few cluster registrations and service account didn't cleanup.

After running command as follows:

satya@opensuse15:~> sudo ./fleet-linux-amd64 -k ~/.kube/config cleanup --min 1ms --max 1ms
Cleaning up outdated cluster registrations: cleanup.Options{Min:1000000, Max:1000000, Factor:1.05}
INFO[0000] Found 1 clusters and 4 cluster registrations 
INFO[0000] Deleting outdated, granted cluster registration fleet-default/request-cpzln, wait for 1ms 
INFO[0000] Deleting granted cluster registration without cluster fleet-default/request-cpzln 
INFO[0000] Deleting granted cluster registration without cluster fleet-default/request-fxxbq 

I don't see any other resources available to which are from the deleted cluster(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

No branches or pull requests

3 participants