Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE Subsetting recalculation only appears to occur when nodes are added, not removed #2650

Open
glasser opened this issue Aug 28, 2024 · 6 comments

Comments

@glasser
Copy link

glasser commented Aug 28, 2024

I use GCE ingresses via hosted GKE (not running my own ingress-gce).

We have a large-ish cluster with GKE subsetting and a lot of internal network pass-through load balancers.

We wanted to move all our pods to a new nodepool so we created the nodepool, drained the old nodes gradually until they didn't have any pods (other than daemonsets), and then scaled the original node pool down to 0 nodes.

This broke many of our load balancers because it removed all the GCE_VM_IP endpoints from their zonal NEGs.

After some experimentation, it appears that when you remove nodes from GKE, GCE_VM_IP endpoints can be removed from the zonal NEG groups associated with internal network pass-through load balancers (with externalTrafficPolicy: Cluster) and the controller won't actively add endpoints for newer nodes.

Adding just one more node to the cluster after this seems sufficient to trigger recalculation and get those NEGs back to 25. But if you don't do that, it seems like you can just lose endpoints from your NEGs and eventually break them!

@swetharepakula
Copy link
Member

Thanks @glasser for reporting!

This should be fixed with #2622.

We are currently qualifying the new release and hope to have it fixed in GKE in the next couple weeks.

@swetharepakula
Copy link
Member

We also have made release nodes with possible mitigations for the time being: https://cloud.google.com/kubernetes-engine/docs/release-notes#August_14_2024

The easiest way to avoid downtime is to switch to using xTP=Local

@glasser
Copy link
Author

glasser commented Aug 28, 2024

Thanks, this sounds exactly like our issue. Not quite up to date enough on reading our release notes I guess!

Switching over our production cluster to Local today seems like a bit less of a scary thing than just scaling down bit by bit and trigger some sort of scale up on another node pool after each time. (Though I don't know a better way to trigger scale-up than to schedule something that doesn't fit... I don't want to just scale the pool normally because there's currently a fair amount of imbalance across zones.)

@glasser
Copy link
Author

glasser commented Aug 28, 2024

(Ah OK, we can just scale up a random other node pool.)

This will be called out in the release notes when it is fixed?

@gauravkghildiyal
Copy link
Member

Hi @glasser. Not sure if you already noticed, but we recently sent out the release updates for this fix: https://cloud.google.com/kubernetes-engine/docs/release-notes#September_10_2024

@glasser
Copy link
Author

glasser commented Sep 26, 2024

@gauravkghildiyal Thanks! And the release notes are clear about what we need to do (control plane upgrade). Trying it in our dev cluster now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants