Scaling Fermyon to multiple nodes #63

radu-matei · 2022-06-18T19:08:15Z

Currently, the Terraform configuration that deploys Fermyon on AWS only creates one node — we should explore scaling the cluster beyond a single node.

ref #62

FrankYang0529 · 2022-08-22T15:06:40Z

I would like to try this issue. My first thought is using systemd to manage consul, nomad, and vault on multiple nodes.

For consul, using retry_join to connect multiple nodes.
For vault, using consul as storage.
For nomad, using consul to automatically join cluster nodes.

After deploying all hasicorp stacks, we may need to add scaling out ability to bindle first. If we can do all of this, then Fermyon platform can be on multiple nodes.

vdice · 2022-08-22T17:54:58Z

@FrankYang0529 Sounds like a great plan! Agreed, converting the Hashicorp services to systemd is the first prerequisite to withstand instance restarts and process terminations. The Consul, Vault and Nomad configuration updates you've mentioned sound right to me.

I'd say scaling Bindle can be an optional follow-up. Bindle doesn't necessarily need to run on every Nomad agent/node -- it can run as a service of count 1 and Nomad will just make sure it is scheduled appropriately. In this case, we could also utilize a host volume to at least make sure bindles are persisted at the host level, pending support for scaling the service out (or other persistence options).

We'd naturally want to increase the hippo replica count (or convert to system) for HA. Traefik should probably change to a system job to be sure it runs on each agent node or convert to a systemd service alongside Nomad/Consul/Vault, again to run on each agent node/host.

FrankYang0529 · 2022-08-23T01:21:36Z

@vdice Thanks for your suggestion! It looks like a workable plan. For Bindle, I feel that we still need scaling-out ability. If we use host volume, we can't lose that node. We can do this step by step. Let me work on Hashicorp stacks first. 👍🏻

mreferre mentioned this issue Jun 24, 2022

Containerize the Fermyon compute unit? #70

Open

FrankYang0529 mentioned this issue Aug 31, 2022

feat(aws): support multiple nodes #100

Merged

FrankYang0529 mentioned this issue Oct 26, 2022

feat: push docker image deislabs/bindle#352

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling Fermyon to multiple nodes #63

Scaling Fermyon to multiple nodes #63

radu-matei commented Jun 18, 2022

FrankYang0529 commented Aug 22, 2022

vdice commented Aug 22, 2022

FrankYang0529 commented Aug 23, 2022

Scaling Fermyon to multiple nodes #63

Scaling Fermyon to multiple nodes #63

Comments

radu-matei commented Jun 18, 2022

FrankYang0529 commented Aug 22, 2022

vdice commented Aug 22, 2022

FrankYang0529 commented Aug 23, 2022