Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some elements are not deleted after gitops prune deletes cluster app #3603

Closed
carillan81 opened this issue Jul 29, 2024 · 6 comments
Closed
Labels
kind/bug team/honeybadger Team Honey Badger team/phoenix Team Phoenix

Comments

@carillan81
Copy link

To reproduce the issue:

  • Using gitops create a kustomization that adds an org, a workload cluster and deploys serveral apps
  • everything is properly deployed and working
  • Delete kustomization after setting prune: true should delete the org, the apps and the wc
  • Some of the elements of the cluster are not deleted.
k tree cluster.cluster.x-k8s.io/presales-demo-gitops  -n org-presales-demo 
NAMESPACE          NAME                                                                READY  REASON          AGE
org-presales-demo  Cluster/presales-demo-gitops                                        True                   35m
org-presales-demo  ├─AWSCluster/presales-demo-gitops                                   False  DeletingFailed  35m
org-presales-demo  ├─KubeadmControlPlane/presales-demo-gitops                          False  Deleting        35m
org-presales-demo  │ ├─Machine/presales-demo-gitops-2n5tq                              False  Deleted         28m
org-presales-demo  │ │ └─AWSMachine/presales-demo-gitops-control-plane-93113e29-hxhdq  False  Deleted         28m
org-presales-demo  │ ├─Machine/presales-demo-gitops-kp6qp                              False  Deleted         32m
org-presales-demo  │ │ └─AWSMachine/presales-demo-gitops-control-plane-93113e29-mdgfp  False  Deleted         32m
org-presales-demo  │ └─Machine/presales-demo-gitops-sr2xq                              False  Deleted         26m
org-presales-demo  │   └─AWSMachine/presales-demo-gitops-control-plane-93113e29-7q9jt  False  Deleted         26m
org-presales-demo  └─MachinePool/presales-demo-gitops-nodepool0                        True                   35m

It looks like the kustomization is triggering deletion in a specific order that is making some of the elements fail to delete.
From the kubeadm-control-plane-controller-manager log:

I0729 09:29:33.606915       1 controller.go:513] "Reconcile KubeadmControlPlane deletion" controller="kubeadmcontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="KubeadmControlPlane" KubeadmControlPlane="org-presales-demo/presales-demo-gitops" namespace="org-presales-demo" name="presales-demo-gitops" reconcileID="baa03b9d-ae79-4c22-ae38-afe20c787159" Cluster="org-presales-demo/presales-demo-gitops"
I0729 09:29:33.606986       1 controller.go:524] "failed to reconcile conditions" controller="kubeadmcontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="KubeadmControlPlane" KubeadmControlPlane="org-presales-demo/presales-demo-gitops" namespace="org-presales-demo" name="presales-demo-gitops" reconcileID="baa03b9d-ae79-4c22-ae38-afe20c787159" Cluster="org-presales-demo/presales-demo-gitops" error="cannot get remote client to workload cluster: org-presales-demo/presales-demo-gitops: failed to create cluster accessor: error fetching REST client config for remote cluster \"org-presales-demo/presales-demo-gitops\": failed to retrieve kubeconfig secret for Cluster org-presales-demo/presales-demo-gitops: Secret \"presales-demo-gitops-kubeconfig\" not found" 

example being deployed in golem from:
https://github.com/giantswarm/presales-demo-gitops/tree/main

@carillan81 carillan81 converted this from a draft issue Jul 29, 2024
@architectbot architectbot added the team/phoenix Team Phoenix label Jul 29, 2024
@fiunchinho
Copy link
Member

It looks like the old bug we had before adding the keep helm annotation to the CRs, where the KubeadmControlPlane is deleted too early, before the CAPA deletion logic kicks in. This makes that the kubeconfig secret is deleted, blocking the rest of the CAPA cleanup.

@carillan81
Copy link
Author

command used to deploy kustomization:
k apply -f management-clusters/golem/golem.yaml
Sops secrets need to be created first.

@fiunchinho fiunchinho removed their assignment Jul 31, 2024
@carillan81
Copy link
Author

After some tests these are the findings:

  • If you delete the cluster from the repo without removing the main kustomization the cluster is properly deleted. This has been tested with prune: true (automatic deletion) and with prune: false and manually deleting the cluster app.
  • If you delete the main kustomization with prune: false and then manually delete the cluster app, the cluster is properly deleted.
  • If you delete the main kustomization with prune: true flux does an aggresive deletion (probably deleting other components including the organization at the same time that the cluster). In this case the deletion fails and some elements are stuck and need to be removed editing finalizers.

@T-Kukawka
Copy link
Contributor

👋 this seems like issue for @giantswarm/team-honeybadger - seems deleting the kustomization with organization break WC clusters deletion within the organization, where the kubeconfig secret is deleted out of order causing issues.

@LutzLange
Copy link

I'm hitting this issues as well.

I do need a way to reliably delete clusters. @giantswarm/team-honeybadger

@weatherhog weatherhog added the team/honeybadger Team Honey Badger label Sep 12, 2024
@ljakimczuk
Copy link

The question of a reliable way of deleting cluster has already been sort of answered by you, I think. Basically, to delete a cluster, the deletion operation should be about the cluster itself, so please follow either the 1st or the 2nd scenario listed by @carillan81 here. When you do bulk resources deletion, including the namespace these resources reside in, you do not experience any special behaviour of Flux, like more aggressive cleanup, but rather a standard Kubernetes routine. You tell Kubernetes to delete a namespace, so it deletes resources inside, including the kubeconfig Secret the CAPI controllers rely upon for their operations. You would get exactly the same result with any tool, including the kubectl, when performing bulk deletion. When you do not want Kubernetes to immediately remove something, you do it with finalizers, hence if there is any problem at all here, it is a missing finalizer in the kubeconfig Secret.

@github-project-automation github-project-automation bot moved this from Inbox 📥 to Done ✅ in Roadmap Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug team/honeybadger Team Honey Badger team/phoenix Team Phoenix
Projects
Archived in project
Development

No branches or pull requests

7 participants