[refinery] Add option to run refinery as a statefulset #237

kamalmarhubi · 2023-04-19T17:57:48Z

Using a statefulset allows giving the pods a stable network identity. Setting setHostnameAsFQDN means that this stable network identity is what os.Hostname reports. Together, they allow using the hostname in the peer list, so that the peer list remains stable even as pods are rescheduled. This improves trace routing stability during refinery upgrades and Kubernetes cluster operations (upgrades / scale-downs), and even makes it possible to run refinery with an affinity preference for spot instances.

kamalmarhubi · 2023-04-19T18:01:44Z

We've been running a statefulset on spot instances in GKE for a couple of days and it looks like it's working out ok.

Main reason this is a draft: should I instead put the statefulset in a separate file and instead avoid repetition via a pod template helper + some mustMerge stuff?

TylerHelmuth · 2023-04-19T18:09:34Z

Main reason this is a draft: should I instead put the statefulset in a separate file and instead avoid repetition via a pod template helper + some mustMerge stuff?

Ya lets have separate deployment.yaml and statefulset.yaml and try to reuse what we can with helper template. A little bit of duplication is fine if it means simplicity..

kamalmarhubi · 2023-06-06T21:02:31Z

@TylerHelmuth I finally got around to this. I had to debug some nindent wrongness* in _pod.tpl. I think I got it right, but I only tried our setup, where it didn't result in any diff from the pre-split version (after rebasing onto current HEAD). I don't know if some other sets of values might have annoying nindent stuff wrong, but I hope not.

‌* I love text templating white-space–sensitive structured data 🙈

Using a statefulset allows giving the pods a stable network identity. Setting `setHostnameAsFQDN` means that this stable network identity is what `os.Hostname` reports. Together, they allow using the hostname in the peer list, so that the peer list remains stable even as pods are rescheduled. This improves trace routing stability during refinery upgrades and Kubernetes cluster operations (upgrades / scale-downs), and even makes it possible to run refinery with an affinity preference for spot instances.

kamalmarhubi · 2023-07-28T19:53:30Z

@TylerHelmuth fyi I just rebased this on top of the current main so we could upgrade to refinery 2.x. We've been running refinery as a statefulset for slightly over 3 months at this point with no real issues.

TylerHelmuth · 2023-07-28T19:57:18Z

@kamalmarhubi thats awesome! I'm glad it is working well. There is some other 2.x cleanup we are prioritizing right now but we'll get back to this PR after.

kamalmarhubi · 2024-08-12T17:26:46Z

@TylerHelmuth bumpty bump on this PR. We're upgrading refinery and I'm wondering if I should rebase this once more or not.

It kind of sounds like the pub/sub changes in 2.7, specifically sharing peer membership, makes cluster membership changes less disruptive; is that right? I think I'd prefer stable pod identity, but I'd also like to stop maintaining a long-running fork of the official chart—whether by rebasing once more and merging, or dropping the statefulset option.

Let me know which works best for you!

TylerHelmuth · 2024-08-12T18:34:01Z

@kamalmarhubi you're right, and 2.8 should expand on those features. There is working ongoing in refinery to make scaling it up and down smoother. Let's wait for that work to complete and then revisit if statefulset is still something we'll need to support.

kamalmarhubi · 2024-08-12T19:00:17Z

@TylerHelmuth sounds good. How does the cluster respond to non-resize membership changes under 2.7? More precisely, how disruptive is it if the membership changes as the deployment's pods are cycled and come up with new names? I recall this was fairly ungood in the past.

TylerHelmuth · 2024-08-12T19:17:18Z

@kamalmarhubi in 2.7 peer membership updates are now pub/sub, so it happens quicker, but the refinery instances still have to rebalance when that happens.

kamalmarhubi force-pushed the refinery-sts branch 2 times, most recently from 263406d to c429925 Compare April 19, 2023 18:05

kamalmarhubi force-pushed the refinery-sts branch 3 times, most recently from 373880f to 41e6685 Compare June 6, 2023 20:56

kamalmarhubi marked this pull request as ready for review June 6, 2023 21:02

kamalmarhubi requested a review from a team as a code owner June 6, 2023 21:02

kamalmarhubi force-pushed the refinery-sts branch from 41e6685 to 4d24e26 Compare July 28, 2023 19:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[refinery] Add option to run refinery as a statefulset #237

[refinery] Add option to run refinery as a statefulset #237

kamalmarhubi commented Apr 19, 2023 •

edited

Loading

kamalmarhubi commented Apr 19, 2023

TylerHelmuth commented Apr 19, 2023

kamalmarhubi commented Jun 6, 2023 •

edited

Loading

kamalmarhubi commented Jul 28, 2023

TylerHelmuth commented Jul 28, 2023

kamalmarhubi commented Aug 12, 2024

TylerHelmuth commented Aug 12, 2024

kamalmarhubi commented Aug 12, 2024

TylerHelmuth commented Aug 12, 2024

[refinery] Add option to run refinery as a statefulset #237

Are you sure you want to change the base?

[refinery] Add option to run refinery as a statefulset #237

Conversation

kamalmarhubi commented Apr 19, 2023 • edited Loading

kamalmarhubi commented Apr 19, 2023

TylerHelmuth commented Apr 19, 2023

kamalmarhubi commented Jun 6, 2023 • edited Loading

kamalmarhubi commented Jul 28, 2023

TylerHelmuth commented Jul 28, 2023

kamalmarhubi commented Aug 12, 2024

TylerHelmuth commented Aug 12, 2024

kamalmarhubi commented Aug 12, 2024

TylerHelmuth commented Aug 12, 2024

kamalmarhubi commented Apr 19, 2023 •

edited

Loading

kamalmarhubi commented Jun 6, 2023 •

edited

Loading