Skip to content
This repository has been archived by the owner on Apr 20, 2023. It is now read-only.

RFC: 1-click deploy #68

Open
xphoniex opened this issue Nov 17, 2021 · 24 comments
Open

RFC: 1-click deploy #68

xphoniex opened this issue Nov 17, 2021 · 24 comments

Comments

@xphoniex
Copy link
Contributor

xphoniex commented Nov 17, 2021

Problem

We want to give interested parties a chance to try out Radicle without having to take care of their own infrastructure. Goal is to introduce a low-friction solution which is also reliable.

Proposal

This is not Radicle's core offering and we'd even encourage competition in this space. Thus our design should be transferable and as plug-and-play as possible.

We'll have an entrance contract that list all contracts offering their service. Decisions would be made based on the number of subscribers to each contract and the price each one is asking.

Upon deciding to purchase, user sends money to either topUp(address org) or registerOrgThenTopUp(). We might need a conversion from ether to stablecoin here, to simplify financing for service providers who have obligations in fiat.

This will eventually emit a NewTopUp event containing org address and probably more info like expiry block. (After talking with Alexis, we decided to keep accounting in block terms on contracts)

Inside each k8s cluster, which ideally lives on a different cloud, we'll have a controller watching NewTopUp events for their respective contract. On new events, we create Deployment and Service for this new org, with the needed containers inside. If it already exists, we simply change the expiry block, without affecting anything else.

We'll use IaC (Infrastructure as code) with Terraform managing the cloud resources for us, thus a potential third party can offer an alternative once they clone our infra code and fill in their own cloud keys.

1-click-deploy

Issues

  1. We are relying on major clouds AWS, GCP and Azure which are in the same jurisdiction. Others lack support in our automation tooling because of poor API or lack of enough interest from community/maintainers.

  2. GeoDNS. Our p2p system, as is, can't optimize for latency-based routing. I think, this needs to be solved on protocol level so we can ideally have two machines representing the same org-node ideally on a write-write capacity but if not, write-read.

  3. High availability. Same as above.

  4. Durability. Data can get lost, while in worst-case scenario, data can be partially or fully recovered by connecting with users' p2p nodes, having a HA solution would make our system more robust.

@sebastinez
Copy link
Member

sebastinez commented Nov 17, 2021

Hey @xphoniex, nice work 👍
I'm not the one who should give feedback on devops, but I wanted to ask if you could elaborate on which services do you think should be spinned up.

From the perspective of an org I think the most interesting one would certainly be the ones in the radicle-client-services repo.
EDIT: Sorry my fault, did not look in which repo the issue was located.

I can imagine that there could be a essentials package, and then some optional ones.
@cloudhead Could there be eventually an issue by having multiple services reading and eventually writing to the same monorepo, and if the services are distributed on different clusters I can imagine that there could eventually replication issues?

@cloudhead
Copy link
Contributor

Yeah, it's implied here that it would be radicle-client-services.

The issue with the monorepo state already exists currently, since the client services all read from the same state.
As long as there is a single writer, there should be no problem having multiple readers.

@cloudhead
Copy link
Contributor

@xphoniex could you describe the topology of the cluster(s)? One small issue I could see is that we have both UDP and TCP services, and I know there is some limitation in k8 with having both on the same instance.

@cloudhead
Copy link
Contributor

I'm also wondering where do we keep track of the mappings between org and physical instances, ie. IP addresses. Do we use DNS?

For instance, right now, each org points to a DNS name via its ENS records. This DNS name in turn points to a physical address. How do you imagine this could work in the above scenario?

@xphoniex
Copy link
Contributor Author

xphoniex commented Nov 17, 2021

@xphoniex could you describe the topology of the cluster(s)? One small issue I could see is that we have both UDP and TCP services, and I know there is some limitation in k8 with having both on the same instance.

I think we're going to have to expose unique tcp/udp as NodePort per deployment, which opens theses ports on all nodes and limits the number of deployments. Not ideal, but should be okay.

HTTP traffic would be handled using nginx, according to subdomain:

I'm also wondering where do we keep track of the mappings between org and physical instances, ie. IP addresses. Do we use DNS?

For instance, right now, each org points to a DNS name via its ENS records. This DNS name in turn points to a physical address. How do you imagine this could work in the above scenario?

E.g. we'll have a wildcard dns record for *.aws.monadic.xyz pointing to our AWS LB, and org x.radicle.eth would point to x.aws.monadic.xyz. I suppose tcp/udp endpoints would also look like aws.monadic.xyz:5000 and aws.monadic.xyz:5000/5001.

Makes sense?

@adaszko
Copy link
Contributor

adaszko commented Nov 17, 2021

Hi @xphoniex 👋. Nice work on the RFC.

I anticipate some problems with org-nodes living outside of Kubernetes trying to form a cluster with the ones living inside it due to NATs.
At least on GCE (I imagine it to be similar on other clouds), outbound Kubernetes connections are NAT'ed. The way to have inbound connections to the Kubernetes cluster, is to set up a load balancer. The problem is the load balancer would need to (1) be QUIC-aware (is this supported by Kubernetes?) (2) Implement some sort of stickiness so that it routes requests concerning the same Peer ID, to the same K8s pod and port.

@adaszko
Copy link
Contributor

adaszko commented Nov 17, 2021

Regarding state storage, am I correct in assuming the plan is to use persistent volumes?

@xphoniex
Copy link
Contributor Author

xphoniex commented Nov 17, 2021

Hi @adaszko 👋

QUIC uses UDP under the hood, but it's not listed here, however it is supported on Google Load Balancer. We might need to test it quickly before proceeding, I guess. (@cloudhead )

Would NAT still be an issue if each org-node has its own separate port? E.g. org x would always hit x.aws.monadic.xyz:5000.

Regarding state storage, am I correct in assuming the plan is to use persistent volumes?

Yes.

@adaszko
Copy link
Contributor

adaszko commented Nov 17, 2021

Hi @adaszko 👋

QUIC uses UDP under the hood, but it's not listed here, however it is supported on Google Load Balancer. We might need to test it quickly before proceeding, I guess. (@cloudhead )

Such a test would be nice 👍

Would NAT still be an issue if each org-node has its own separate port? E.g. org x would always hit x.aws.monadic.xyz:5000.

Let's take the case of an outside org-node making 2 requests to an inside one. Even if we have distinct ports for every org-node, the load balancer can direct the 2nd request to a different node than the 1st one unless we set some session affinity (respective functionalities in other clouds need to be researched). Out of the session affinity types listed on the linked website, the most apt seems to be the one based on HTTP headers. The target PeerId value would have to be added to some preordained header like Radicle-PeerId: .... Even then, session affinity is still only best effort, according to the documentation.

It's not a trivial issue, unless of course, librad protocol implementation is so resilient that it can handle (1) Violation of the assumption that 2 consecutive requests of any type addressed to the same (DNS name, port) pair actually reach the same node (that's the load balancer issue) and (2) IP addresses of nodes change at an unpredictable times due to Kubernetes rebalancing to different pods and/or self-healing (DNS names and ports stay the same though). If the implement is in fact so resilient that we have (1) and (2), we can basically go ahead full steam with K8s deployment. These 2 are a pretty tall order though and we'll need be aware that the whole cluster will perform worse (more errors, retries, higher latency) even if the implementation handles (1) and (2) 100% correctly.

@kim @FintanH I'm curious what you guys think, especially regarding seeing the last paragraph from the protocol implementation standpoint.

@xphoniex
Copy link
Contributor Author

xphoniex commented Nov 17, 2021

Let's take the case of an outside org-node making 2 requests to an inside one. Even if we have distinct ports for every org-node, the load balancer can direct the 2nd request to a different node than the 1st one unless we set some session affinity

I'm not so sure about this. We're gonna have a single deployment/service per org thus even if it hits another node, the packet will always be redirected using iptable rules back to the intended node where the deployment lives. No?

@adaszko
Copy link
Contributor

adaszko commented Nov 17, 2021

Let's take the case of an outside org-node making 2 requests to an inside one. Even if we have distinct ports for every org-node, the load balancer can direct the 2nd request to a different node than the 1st one unless we set some session affinity

I'm not so sure about this. We're gonna have a single deployment/service per org thus even if it hits another node, the packet will always be redirected using iptable rules back to the intended node where the deployment lives. No?

I'm still talking about the case of org-nodes living outside of Kubernetes trying to connect the ones living inside it. Radicle is a peer to peer app (i.e. not a SaaS) so I think it's fair to say the nodes can connect from anywhere.

@kim
Copy link

kim commented Nov 17, 2021

It is a bit unclear what you're trying to achieve here. I am assuming that you are aware that radicle-link is a peer-to-peer protocol, and thus it is nonsensical to try and cluster / load-balance individual nodes (if not, I'm happy to explain).

You can surely use k8s to spin up singleton instances of a node (like a database, iirc StatefulSet is the thing to use for that). However, it is most likely the case that you would need to employ custom SDN, or else use NodePort and public IP addresses to make nodes be able to communicate.

If your goal is to only cluster the HTTP/git interfaces of an org-node for availability reasons, you may be able to do that by mounting a shared volume containing the state. Since network-attached storage is both slow and does not necessarily exhibit POSIX semantics (O_EXCL specifically), I would recommend mounting read-only.

@kim
Copy link

kim commented Nov 17, 2021

For the latter to work, you would obviously need to be able to run those endpoints standalone, ie. without spinning up the p2p stack.

@cloudhead
Copy link
Contributor

So as I understand the discussion, the LB is not really used to "balance load" between homogeneous backends, but rather just to route traffic? or did I get this wrong?

In the simplest case, each org/user has 1 replica, and the cluster is heterogeneous, ie. the nodes are not interchangeable.

In the more advanced case, an org may want to deploy multiple replicas. Using a load balancer in front of those nodes would make sense in case we're worried that one of the nodes goes down, but I think we may find that having the clients directly connect to the individual replicas simplifies things, since this is supported by the protocol.

@xphoniex
Copy link
Contributor Author

Just to clear, the reason for choosing k8s here is that it standardizes our lifecycle management.
Since it complicates networking for us we have two options:

  • Build and manage our own cluster from publicly facing nodes, so we get benefits of scheduler and controller without having to sacrifice networking (since each node will have a public IP)
  • Directly order a VPS from providers like DigialOcean/Hetzner, do the initial setup using Ansible. Set DNS records (on our own domains, pointing org to new IP), and keep state in our own DB.

Sounds like the second option would be more appropriate at this. Any comments?

Note: there are some limitations that apply to VPS route, DigitalOcean for example doesn't allow more than 10 droplets unless you talk to support. AWS is at 20. We also need to be careful we don't hit a hard limit on our DNS provider, as we're setting a new record per server.

@kim
Copy link

kim commented Nov 17, 2021

You can use k8s' LoadBalancer concept as a NAT device to translate to a 1-replica StatefulSet. Whether that's cost-efficient depends on your pain tolerance, and whether you are able to significantly overcommit (ie. most nodes are idle most of the time). Note that you need one external IP per node, unless clients can address using port numbers.

You can not expect any kind of transparent mapping of a single address to multiple, independent nodes to work as you'd expect. Even if you do that for just the HTTP endpoints and use session affinity, it will probably not yield the web experience you're after, because replication is by design asynchronous. I'm not sure how important this is, though, as long as the node gets restarted automatically if something goes wrong.

I get that what you want is essentially "virtual hosts", but I'm afraid this won't be possible on the p2p layer until HTTP/3 gets standardised. You could consider creating extra SRV or TXT records which would allow a p2p node to discover an IP:port pair (and possibly the peer id, too), but I don't think there's anything off-the-shelf which would automate this on k8s.

@xphoniex
Copy link
Contributor Author

We can bypass LoadBalancer NAT, we just need to assign every node its own external IP. Still I'm perfectly fine not taking the k8s route, and building it this way:

Directly order a VPS from providers like DigialOcean/Hetzner, do the initial setup using Ansible. Set DNS records (on our own domains, pointing org to new IP), and keep state in our own DB.

Issue with this is I'll end up writing some glue scripts to tie everything together and we can't use the idle resources anymore, as kim mentioned.

If no one has any objection with the design, I can start prototyping with pulumi.

@kim
Copy link

kim commented Nov 18, 2021 via email

@xphoniex
Copy link
Contributor Author

For that case we won't be using fixed port, each instance/org will have a separate port.

@kim
Copy link

kim commented Nov 18, 2021 via email

@xphoniex
Copy link
Contributor Author

HTTP traffic is simple, it's not p2p and would come through LB. We just need to update ingress rules with our controller.

@kim
Copy link

kim commented Nov 18, 2021 via email

@xphoniex
Copy link
Contributor Author

Does this help?

                           ┌────────────────────────────┐
                           │ org-node 192.x.x.x:8776    │
                      ┌───►│                            │
                      │    └────────────────────────────┘
                      │
                      │
                      │
                      │ peering 192.x.x.x:8776 with 200.x.x.1:5000
                      │         (for org x)
                      │
                      │
       ┌──────────────┼────────────────────────────────────────────────────────────────────────────┐
       │              │                                                                            │
       │              │     Node#1 Public IP: 200.x.x.1           Node#2 Public IP: 200.x.x.2      │
       │              │                                                                            │
       │              │    ┌────────────────────────────┐        ┌────────────────────────────┐    │
       │              └───►│org-node :5000              │        │org-node :5002              │    │
       │                   ├────────────────────────────┤◄─────┐ ├────────────────────────────┤    │
       │                   │http-server: 5001        ▲  │      │ │http-server: 5003 ◄─────────┼┐   │
       │                   │                         │  │ ┌────┤►│                            ││   │
       │                   ├─────────────────────────┼──┤ │    │ ├────────────────────────────┼│   │
       │                   │nginx                    │  │ │    │ │nginx                       ││   │
┌──────┴─────┐             │x.radicle.eth -> x.svc:5001 │ │    └─┤x.radicle.eth -> x.svc:5001 ││   │
│LoadBalancer├───┬────────►│y.radicle.eth -> y.svc:5003 ├─┘      │y.radicle.eth -> y.svc:5003 ├┘   │
└──────┬─────┘   │         └────────────────────────────┘        └──┬────◄────────────────────┘    │
       │         │                                                  │                              │
       │         └──────────────────────────────────────────────────┘                              │
       │                                                                                           │
       └───────────────────────────────────────────────────────────────────────────────────────────┘
       ```

@kim
Copy link

kim commented Nov 18, 2021

That makes sense, if the org-node port is unique per radicle.eth CNAME (assuming that's the p2p port). That is, x.radicle.eth has a different port than y.radicle.eth, and the client has a way to discover that.

There is no Host header on the p2p layer, and even if there was one you could not make use of it for routing unless the proxy servers know the private keys of every org-node behind them.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants