Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling Fermyon to multiple nodes #63

Open
radu-matei opened this issue Jun 18, 2022 · 3 comments
Open

Scaling Fermyon to multiple nodes #63

radu-matei opened this issue Jun 18, 2022 · 3 comments

Comments

@radu-matei
Copy link
Member

Currently, the Terraform configuration that deploys Fermyon on AWS only creates one node — we should explore scaling the cluster beyond a single node.

ref #62

@FrankYang0529
Copy link
Contributor

I would like to try this issue. My first thought is using systemd to manage consul, nomad, and vault on multiple nodes.

After deploying all hasicorp stacks, we may need to add scaling out ability to bindle first. If we can do all of this, then Fermyon platform can be on multiple nodes.

@vdice
Copy link
Member

vdice commented Aug 22, 2022

@FrankYang0529 Sounds like a great plan! Agreed, converting the Hashicorp services to systemd is the first prerequisite to withstand instance restarts and process terminations. The Consul, Vault and Nomad configuration updates you've mentioned sound right to me.

I'd say scaling Bindle can be an optional follow-up. Bindle doesn't necessarily need to run on every Nomad agent/node -- it can run as a service of count 1 and Nomad will just make sure it is scheduled appropriately. In this case, we could also utilize a host volume to at least make sure bindles are persisted at the host level, pending support for scaling the service out (or other persistence options).

We'd naturally want to increase the hippo replica count (or convert to system) for HA. Traefik should probably change to a system job to be sure it runs on each agent node or convert to a systemd service alongside Nomad/Consul/Vault, again to run on each agent node/host.

@FrankYang0529
Copy link
Contributor

@vdice Thanks for your suggestion! It looks like a workable plan. For Bindle, I feel that we still need scaling-out ability. If we use host volume, we can't lose that node. We can do this step by step. Let me work on Hashicorp stacks first. 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants