Workflows of gpu‐provisioner

execute the command to create a NodeClaim: kubectl apply -f examples/v1-nodeclaim-gpu.yaml

execute the command to delete specified NodeClaim: kubectl delete -f examples/v1-nodeclaim-gpu.yaml

If there is no related node for the NodeClaim, the NodeClaim finalizer will be removed directly instead of blocking util node is ready.
If NodeClaim is deleted during node launch, cloudprovider instance and node will be leaked. we add a new controller named instance garbage collection controller to cleanup leaked resource.

Delete the specified NodePool on the AKS portal.
Our expectation is that NodeClaim will be leaked. nodeclaim.garbagecollection controller should garbage collect it. but in AKS this controller will not take effect.
When the backend NodePool is removed, The node in AKS will be deleted, so [node termination controller] will be triggered, which in turn triggers the [nodeclaim termination controller]. As a result, no NodeClaims will be leaked when backend NodePools are removed.
Combine these three workflows, resource removal sequence in gpu-provisioner is: CloudProvider Instance --> Node --> NodeClaim

execute the command to create a NodeClaim: kubectl apply -f examples/v1-nodeclaim-gpu.yaml
then execute the command to delete the NodeClaim in 1min: kubectl delete -f examples/v1-nodeclaim-gpu.yaml

If NodeClaim is deleted during node launch, cloudprovider instance and node will be leaked.
Instance Garbage Collect Controller will iterate all cloud provider instances in every 2min, and cleanup all leaked instances and nodes.
code link: https://github.com/Azure/gpu-provisioner/blob/main/pkg/controllers/instance/garbagecollection/controller.go

delete the specified node on the AKS portal. when node becomes not ready, NodeClaim status will also become not ready.
if node not ready status exceeds 10min, NodeClaim garbage collect controller will delete the related NodeClaim. But it seems that RP has deleted node in AKS cluster before NodeClaim garbage collect controller takes affect.

Provide feedback