Skip to content

Latest commit

 

History

History
137 lines (109 loc) · 5.76 KB

README.adoc

File metadata and controls

137 lines (109 loc) · 5.76 KB

GossipCluster

Goal: Implement a UDP-multicast clustering mechanism in elixir.

Disclaimer: This repository is for educational purposes only.

Following in the footsteps of bitwalker’s libcluster [1] library, this library uses a gossip (epidemic) protocol [2] to locate and connect to other nodes on the network.

Each node starts a UDP socket subscribed to a multicast address. Each node both sends "heartbeats" to that address and listens for heartbeats from other nodes on that address. When a node sends a message out, it encodes its own location in the packet. When a node receives a message, it decodes the peer node’s location and attempts to connect via Node.connect/1.

By default, each node listens to an unassigned multicast address in the ADHOC-III block [6]. This should be of little consequence in the scope of this exercise, but in any case is unlikely to collide with other applications running on the network.

This is surprisingly easy to achieve in Elixir. A naive implementation fits within 60 lines of code. There are many quality-of-life features of elixir and erlang which make this possible:

  • Elixir and Erlang define a GenServer ("generic server") module which we can use to define a wide variety of server processes. These define high-level APIs frequently driven with pattern-match syntax for handling arbitrary RPC messages, where messages are structured data in the process’s "mailbox", a stack-like structure.

  • When we start a UDP socket with :gen_udp.open/1 [3], it starts in "active" mode by default. When an "active" socket receives a message, it marshals that message into the owning process’s mailbox. Our GenServer need only implement a handle_info({:udp, socket, ip, port, binary_or_list_packet}, state) [4] callback to receive data over the network. This is very convenient.

  • Elixir and erlang implement strings as binaries, which are also bitstrings. Elixir and erlang also support high-level pattern-match syntax [5] for operating on bitstrings, which has the convenient effect that <<"heartbeat::" <> node_name::binary>> = packet is both a guard clause to only handle UDP packets starting with a "heartbeat::" UTF-8 sentinel and a pattern to extract the binary-encoded node name from the packet. Once we have the encoded name, we then parse the binary with :erlang.bytes_to_term(node_name) and simply Node.connect/1 to the result!

Run It

Each copy of our gossip application attempts to bind to port 8888 by default, and we need to run more than one in order to demonstrate their ability to gossip. As a result, we need to run copies of the gossip application on different hosts. Below we will explore a kubernetes solution with minikube and a manual solution using multiple computers (be they VMs or laptops).

Minikube

In this section we will run the application using a minikube cluster and kubernetes that hosts two copies of the gossip-cluster application.

By default, kubernetes pods cannot receive external multicast traffic. However, we can set the hostNetwork: true flag on each pod, which gives pods access to the host network. However, we will no longer be able to schedule multiple pods on the same host, since they could encounter port collisions. We will start minikube with two nodes to ensure both of our pods are scheduled.

Since our docker image is not pushed to a public repository, we need to load it into minikube’s image cache using minikube image load <tag>.

The whole process looks like this:

minikube start --nodes=2
docker build -t gossip .
minkube image load gossip
kubectl apply -f kubernetes/

To confirm that our containers are gossipping, inspect their logs:

kubectl logs gossip-a
kubectl logs gossip-b

When finished, delete the pods with kubectl delete -f kubernetes/, and stop minikube with minikube stop. You can delete the minikube nodes entirely with minikube delete.

Two Computers

In this section we will launch the application on two physical or virtual hosts on the same network. If you have two laptops available and minikube isn’t an option, this is an easy way to demonstrate the gossiping application.

Start the gossip container on one computer with RELEASE_NAME=a using docker, podman, or any other compatible alternative.

podman build --tag gossip .
podman run --net=host -e RELEASE_NAME=a gossip

On another computer, again start a container but using a different release name, such as RELEASE_NAME=b.

podman build --tag gossip .
podman run --net=host -e RELEASE_NAME=b gossip

As long as both containers started with different release names you should see a NODEUP messages within a few seconds on both computers as they locate one another over multicast. You main get an error like ** Cannot get connection id for node name@host. if you forget to change the release names between hosts.

If you do not or cannot use docker, podman, or equivalent, you can instead run the gossip application using iex assuming you have an elixir installation. Run the following command on each host, making sure to set the --sname parameter differently on each host.

iex --sname a --cookie cookie -S mix