-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refinery] Add option to run refinery as a statefulset #237
base: main
Are you sure you want to change the base?
Conversation
We've been running a statefulset on spot instances in GKE for a couple of days and it looks like it's working out ok. Main reason this is a draft: should I instead put the statefulset in a separate file and instead avoid repetition via a pod template helper + some mustMerge stuff? |
263406d
to
c429925
Compare
Ya lets have separate |
373880f
to
41e6685
Compare
@TylerHelmuth I finally got around to this. I had to debug some * I love text templating white-space–sensitive structured data 🙈 |
Using a statefulset allows giving the pods a stable network identity. Setting `setHostnameAsFQDN` means that this stable network identity is what `os.Hostname` reports. Together, they allow using the hostname in the peer list, so that the peer list remains stable even as pods are rescheduled. This improves trace routing stability during refinery upgrades and Kubernetes cluster operations (upgrades / scale-downs), and even makes it possible to run refinery with an affinity preference for spot instances.
41e6685
to
4d24e26
Compare
@TylerHelmuth fyi I just rebased this on top of the current main so we could upgrade to refinery 2.x. We've been running refinery as a statefulset for slightly over 3 months at this point with no real issues. |
@kamalmarhubi thats awesome! I'm glad it is working well. There is some other 2.x cleanup we are prioritizing right now but we'll get back to this PR after. |
@TylerHelmuth bumpty bump on this PR. We're upgrading refinery and I'm wondering if I should rebase this once more or not. It kind of sounds like the pub/sub changes in 2.7, specifically sharing peer membership, makes cluster membership changes less disruptive; is that right? I think I'd prefer stable pod identity, but I'd also like to stop maintaining a long-running fork of the official chart—whether by rebasing once more and merging, or dropping the statefulset option. Let me know which works best for you! |
@kamalmarhubi you're right, and 2.8 should expand on those features. There is working ongoing in refinery to make scaling it up and down smoother. Let's wait for that work to complete and then revisit if statefulset is still something we'll need to support. |
@TylerHelmuth sounds good. How does the cluster respond to non-resize membership changes under 2.7? More precisely, how disruptive is it if the membership changes as the deployment's pods are cycled and come up with new names? I recall this was fairly ungood in the past. |
@kamalmarhubi in 2.7 peer membership updates are now pub/sub, so it happens quicker, but the refinery instances still have to rebalance when that happens. |
Using a statefulset allows giving the pods a stable network identity. Setting
setHostnameAsFQDN
means that this stable network identity is whatos.Hostname
reports. Together, they allow using the hostname in the peer list, so that the peer list remains stable even as pods are rescheduled. This improves trace routing stability during refinery upgrades and Kubernetes cluster operations (upgrades / scale-downs), and even makes it possible to run refinery with an affinity preference for spot instances.