![NOTE] I am relocating my physical servers, and these are required for my CI pipeline to test new PRs. As such the PRs are failing untill this relocation is done, but the project is very much still alive.
This repo contains the deployment artifacts for Devantler's Homelab. The Homelab is a Kubernetes cluster that is highly automated with the use of Flux GitOps, CI/CD with Automated Testing, and much more. Feel free to look around. You might find some inspiration ππ»
Show/hide folder structure
.
βββ .github
βΒ Β βββ workflows
βββ .vscode
βββ docs
βΒ Β βββ images
βββ k8s
βΒ Β βββ clusters
βΒ Β βΒ Β βββ homelab-local
βΒ Β βΒ Β βΒ Β βββ apps
βΒ Β βΒ Β βΒ Β βββ flux-system
βΒ Β βΒ Β βΒ Β βββ infrastructure
βΒ Β βΒ Β βΒ Β βΒ Β βββ controllers
βΒ Β βΒ Β βΒ Β βββ variables
βΒ Β βΒ Β βββ homelab-prod
βΒ Β βΒ Β βββ apps
βΒ Β βΒ Β βββ flux-system
βΒ Β βΒ Β βββ infrastructure
βΒ Β βΒ Β βΒ Β βββ controllers
βΒ Β βΒ Β βΒ Β βββ gha-runner-scale-sets
βΒ Β βΒ Β βββ variables
βΒ Β βββ components
βΒ Β βΒ Β βββ flux-kustomization-post-build-variables-label
βΒ Β βΒ Β βββ flux-kustomization-sops-label
βΒ Β βΒ Β βββ helm-release-crds-label
βΒ Β βΒ Β βββ helm-release-remediation-label
βΒ Β βΒ Β βββ network-policy-default-deny
βΒ Β βββ distributions
βΒ Β βΒ Β βββ k3s
βΒ Β βΒ Β βΒ Β βββ apps
βΒ Β βΒ Β βΒ Β βββ infrastructure
βΒ Β βΒ Β βΒ Β βΒ Β βββ controllers
βΒ Β βΒ Β βΒ Β βββ variables
βΒ Β βΒ Β βββ talos
βΒ Β βΒ Β βββ apps
βΒ Β βΒ Β βββ infrastructure
βΒ Β βΒ Β βΒ Β βββ controllers
βΒ Β βΒ Β βΒ Β βββ cilium
βΒ Β βΒ Β βΒ Β βββ kubelet-serving-cert-approver
βΒ Β βΒ Β βΒ Β βββ longhorn
βΒ Β βΒ Β βββ variables
βΒ Β βββ shared
βΒ Β βββ apps
βΒ Β βΒ Β βββ fleetdm
βΒ Β βΒ Β βββ headlamp
βΒ Β βΒ Β βββ homepage
βΒ Β βΒ Β βββ open-webui
βΒ Β βΒ Β βββ plantuml
βΒ Β βββ infrastructure
βΒ Β βΒ Β βββ cloudflared
βΒ Β βΒ Β βββ controllers
βΒ Β βΒ Β βΒ Β βββ capi-operator
βΒ Β βΒ Β βΒ Β βββ cert-manager
βΒ Β βΒ Β βΒ Β βββ gha-runner-scale-set-controller
βΒ Β βΒ Β βΒ Β βββ goldilocks
βΒ Β βΒ Β βΒ Β βββ k8sgpt-operator
βΒ Β βΒ Β βΒ Β βββ kyverno
βΒ Β βΒ Β βΒ Β βββ metrics-server
βΒ Β βΒ Β βΒ Β βββ reloader
βΒ Β βΒ Β βΒ Β βββ testkube
βΒ Β βΒ Β βΒ Β βΒ Β βββ crds
βΒ Β βΒ Β βΒ Β βββ traefik
βΒ Β βΒ Β βΒ Β βββ trivy-operator
βΒ Β βΒ Β βββ dex
βΒ Β βΒ Β βββ harbor
βΒ Β βΒ Β βββ helm-charts-oci-proxy
βΒ Β βΒ Β βββ kube-prometheus-stack
βΒ Β βΒ Β βββ middlewares
βΒ Β βΒ Β βΒ Β βββ basic-auth
βΒ Β βΒ Β βΒ Β βββ forward-auth
βΒ Β βΒ Β βββ oauth2-proxy
βΒ Β βΒ Β βββ ollama
βΒ Β βΒ Β βββ selfsigned-cluster-issuer
βΒ Β βββ tenants
βΒ Β βββ variables
βββ talos
βββ hetzner
βββ patches
βββ cluster
βββ nodes
79 directories
For development:
For production:
- A Talos Cluster
Note
You can use other distributions as well, but the configuration is optimized for Talos, and thus it is not guaranteed to work with other distributions.
To run this cluster locally, simply run the following command:
ksail up homelab-local
Note
To run this cluster on your metal, would require that you have access to my SOPS keys. This is ofcourse not possible, so you would need to create your own keys and replace the existing ones, if you want to run my cluster configuration on your own metal.
- The keys that
KSail
uses are stored in~/.ksail/age
where one Age key is store for each cluster, and named according to the cluster name. For example~/.ksail/age/homelab-local
. - To update SOPS to work with
Ksail
, you need to update the.sops.yaml
file in the root of the repository, and replace theage
keys with your own keys. - To update the manifests to work with
KSail
, you need to replace all.sops.yaml
files with new ones, that are encrypted with your own keys.
For the production cluster, you would need to do the same, but in addition to storing the keys in ~/.ksail/age
, you would also need to store the keys in GitHub Secrets, such that the CI/CD pipeline can provision the keys to the cluster.
The cluster uses Flux GitOps to reconcile the state of the cluster with single source of truth stored in this repository and published as an OCI image. For development, the cluster is spun up by KSail
and for production, the cluster is provisioned by Talos Omni
.
The cluster configuration is stored in the k8s/*
directories where the structure is as follows:
clusters/
: Contains the the cluster specific configuration for each environment.components/
: Contains the reusable components that are used across the cluster.distributions/
: Contains the distribution specific configuration.shared/
: Contains the shared configuration for all clusters.apps/
: Contains the application specific manifests.- FleetDM - To provide a device management for my devices. (currently not in use, as it does not support ARM64)
- Headlamp - To provide a lightweight and extensible Kubernetes UI.
- Homepage - To provide a dashborad for the cluster.
- Open WebUI - To provide a web interface and a REST API for interacting with LLM's.
- PlantUML - To provide a web interface and a REST API for generating PlantUML diagrams.
- Traefik - To provide an ingress controller for the cluster.
custom-resources/
: Contains the custom resources that are used across the cluster.- Middlewares - Contains the middlewares that are used by Traefik.
- Selfsigned Cluster Issuer - Contains the selfsigned cluster issuer that is used by Traefik.
infrastructure/
: Contains the infrastructure specific manifests.- Cert Manager - For managing certificates in the cluster.
- Cloudflared - For tunneling traffic to the cluster.
- Dex - For providing OIDC authentication for the cluster.
- Cluster API Operator - For managing the lifecycle of Kubernetes clusters.
- GitHub Actions Runner Scale Set Controller - To manage GitHub Actions Runner Scale Sets in the cluster.
- GitHub Actions Runner Scale Sets - To run GitHub Actions in the cluster.
- Goldilocks - To provide and apply resource recommendations for pods.
- Harbor - To store and distribute container images.
- K8sGPT Operator - To analyze the cluster for improvements, vulnerabilities or bugs.
- Kube Prometheus Stack - To provide monitoring for the cluster. (Prometheus, Grafana, Alertmanager, etc.)
- Kyverno - To enforce policies in the cluster.
- Longhorn - To provide distributed storage for the cluster.
- Metrics Server - To provide metrics for the cluster.
- OAuth2 Proxy - To provide authentication for the cluster.
- Ollama - To run LLM's on the cluster.
- Reloader - To reload deployments when secrets or configmaps change.
- Testkube - To provide a testing framework for the cluster.
- Trivy Operator - To analyze the cluster for vulnerabilities.
tenants
: Contains Flux kustomizations to bootstrap and onboard tenants. (currently not in use)variables/
: Contains global variables, that are the same for all clusters.
To support hooking into the kustomize flow for adding or modifying resources for a specific cluster, a specific distribution, or shared across all clusters, the following structure is used:
This means that for every root level kustomization that is applied to the cluster, there should be a corresponding folder in either clusters
, distributions
, or shared
that contains the resources that should be applied to the cluster at that scope. For example, for a root level kustomization in k8s/clusters/<cluster-name>/flux-system/infrastructure.yaml
, there should be a corresponding folder in:
k8s/clusters/<cluster-name>/infrastructure/
k8s/distributions/<distribution-name>/infrastructure/
k8s/shared/infrastructure/
- 1x Hetzner CAX21 node (QEMU ARM64 4CPU 8Gb RAM 160Gb SSD) for both control plane and worker node
- 2x Hetzner CAX41 node (QEMU ARM64 16CPU 32Gb RAM 320Gb SSD) for both control plane and worker nodes
- 1x Apple Hypervisor ARM64 VM (Running on Mac Mini M2 Pro with access to 32GB RAM and 20 cores (overprovisioned 2/1) as a worker node
- Unifi Cloud Gateway - For networking and firewall.
- External Samsung T5/T7 SSD Disks - For distributed storage across the cluster.
- Unifi - For configuring a DMZ zone for my own nodes to run in, along with other security features.
- UTM - For running Kubernetes on Mac Mini via Apple Hypervisor.
- Talos Omni - For provisioning the production cluster, and managing nodes, updates, and the Talos configuration.
- Cloudflare - For etcd backups, DNS, and tunneling all traffic so my network stays private.
- Flux GitOps - For managing the kubernetes applications and infrastructure declaratively.
- SOPS and Age - For encrypting secrets at rest, allowing me to store them in this repository with confidence.
- KSail - For developing the cluster locally, and for running the cluster in CI to ensure all changes are properly tested before being applied to the production cluster.
- K8sGPT - To analyze the cluster for improvements, vulnerabilities or bugs. It integrates with Trivy and Kuverno to also provide security and policy suggestions.
Item | No. | Per unit | Total |
---|---|---|---|
Hetzner CAX21 | 3 | 7,49β¬ | $24,9 |
Hetzner CAX41 | 1 | 29,99β¬ | $33,23 |
Talos Omni | 1 | $10 | $10 |
Cloudflare Domains | 2 | $0,87 | $1,74 |
$69,87 |