GitOps Infrastructure with FluxCD

Overview

This repository demonstrates a GitOps-based infrastructure management system using FluxCD. It showcases a complete setup for managing Kubernetes resources across multiple environments (staging and production) with automated image updates and deployment strategies.

Project Structure

The project is organized into several key directories:

apps/: Contains application-specific configurations
- base/: Base configurations for applications
- staging/: Staging environment-specific configurations
- prod/: Production environment-specific configurations
clusters/: Cluster-specific configurations
infrastructure/: Infrastructure-related resources
- image-update-automation/: Image update automation configurations
- sources/: Source configurations for FluxCD

Key Features

Multi-Environment Support: Separate configurations for staging and production environments.
Automated Image Updates: Utilizes FluxCD's image automation controllers to automatically update image tags.
Kustomize Integration: Leverages Kustomize for managing environment-specific configurations.
GitOps Workflow: All changes to the infrastructure are made through Git, ensuring version control and auditability.
Continuous Deployment: Automatic synchronization between Git repository and Kubernetes cluster.

Deployment Strategy

Staging Environment

Automatically updates to the latest image tag matching the pattern staging-[commit-hash]-[timestamp].
Allows for rapid testing of new features and bug fixes.
Configured with 1 replica for resource efficiency.

Production Environment

Uses semantic versioning (SemVer) for image tags.
Automatically updates to the latest patch version within the v1.x.x range.
Ensures stability while allowing for minor updates and patches.
Configured with 3 replicas for high availability.

Installation and Setup

Prerequisites

Kubernetes cluster (version 1.20+)
kubectl configured to communicate with your cluster
GitHub account and personal access token with repo permissions

Installing FluxCD

Install the Flux CLI:

curl -s https://fluxcd.io/install.sh | sudo bash

Export your GitHub personal access token and username:

export GITHUB_TOKEN=<your-token>
export GITHUB_USER=<your-username>

Check your Kubernetes cluster:
```
flux check --pre
```

Bootstrap Flux on your cluster:

flux bootstrap github \
  --owner=$GITHUB_USER \
  --repository=gls-fleet-infra \
  --branch=master \
  --path=./clusters/prod \
  --personal \
  --components-extra=image-reflector-controller,image-automation-controller

Verify the installation:
```
flux get all
```

Usage

After installation, Flux will automatically synchronize the cluster state with the Git repository. To make changes:

Clone the repository:

git clone https://github.com/$GITHUB_USER/gls-fleet-infra.git

Make changes to the YAML files as needed.

Commit and push your changes:

git add .
git commit -m "Update configuration"
git push

Flux will automatically detect and apply the changes to your cluster.

Monitoring and Troubleshooting

Use flux get all to see the status of all Flux resources.
Check logs with flux logs -f --level debug.
For more detailed troubleshooting, use kubectl commands to inspect specific resources.
Monitor image update automation: flux get images all

Configuration Details

Image Update Automation

The image update automation is configured in infrastructure/image-update-automation/ directory. It includes:

ImageRepository definition for the application
ImagePolicies for both staging and production environments
ImageUpdateAutomation configuration

Environment-Specific Configurations

Staging: apps/staging/gls-python-helloworld-app/kustomization.yaml
Production: apps/prod/gls-python-helloworld-app/kustomization.yaml

These files define environment-specific settings like namespaces, replicas, and image tags.

Related Repositories

Application Repository

The application code for this infrastructure is maintained in a separate repository:

Repository: gls-python-helloworld-app
Description: This repository contains the source code for the Python Hello World application that is deployed and managed by this GitOps infrastructure.

The application repository is an integral part of the overall system and represents the actual workload being deployed through this GitOps setup. It's important to note that changes to the application code in this repository will trigger the image build and update process managed by FluxCD in this infrastructure repository.

Monitoring and Alerting

Future Enhancements: Advanced Monitoring

While not currently implemented due to task constraints, a robust monitoring solution using the Prometheus stack is recommended for production environments. This setup would include:

Prometheus: For metrics collection and storage.
Alertmanager: For handling alerts and notifications.
Grafana: For visualization and dashboarding.

This advanced monitoring setup would provide:

Real-time visibility into cluster and application performance.
Custom alerting based on predefined thresholds.
Comprehensive dashboards for both infrastructure and application metrics.
Long-term metrics storage for trend analysis and capacity planning.

Implementation of this monitoring stack would involve:

Deploying the Prometheus Operator using Flux.
Creating custom ServiceMonitors for Flux components and applications.
Configuring Alertmanager for intelligent alert routing and aggregation.
Designing Grafana dashboards for visualizing Flux, Kubernetes, and application-specific metrics.

This enhanced monitoring capability would significantly improve observability and incident response times in a production environment.

Basic Monitoring Setup as Example

To implement basic monitoring for the application and cluster:

Deploy Prometheus and Grafana: Use the kube-prometheus-stack Helm chart via Flux:

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: kube-prometheus-stack
  namespace: monitoring
spec:
  chart:
    spec:
      chart: kube-prometheus-stack
      sourceRef:
        kind: HelmRepository
        name: prometheus-community
      version: "39.x"
  interval: 1h
  values:
    grafana:
      enabled: true
    prometheus:
      enabled: true

Configure ServiceMonitors: Create ServiceMonitors for your application:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: gls-python-helloworld-app
spec:
  selector:
    matchLabels:
      app: gls-python-helloworld-app
  endpoints:
  - port: http
    path: /metrics

Setup Dashboards: Create Grafana dashboards for visualizing metrics. You can import existing dashboards or create custom ones using Grafana's UI.

Alerting for Critical Issues

To set up alerting for critical issues:

Configure Alertmanager: Alertmanager is included in the kube-prometheus-stack. Configure it in the HelmRelease:

spec:
  values:
    alertmanager:
      config:
        global:
          resolve_timeout: 5m
        route:
          group_by: ['job']
          group_wait: 30s
          group_interval: 5m
          repeat_interval: 12h
          receiver: 'slack'
        receivers:
        - name: 'slack'
          slack_configs:
          - api_url: 'https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX'
            channel: '#alerts'

Define PrometheusRules: Create alert rules for your application:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: gls-python-helloworld-app-alerts
spec:
  groups:
  - name: gls-python-helloworld-app
    rules:
    - alert: ApplicationDown
      expr: up{job="gls-python-helloworld-app"} == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Application is down"
        description: "gls-python-helloworld-app has been down for more than 5 minutes."
    - alert: HighCPUUsage
      expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
      for: 15m
      labels:
        severity: warning
      annotations:
        summary: "High CPU usage detected"
        description: "CPU usage is above 80% for more than 15 minutes."

Integrate with Notification Channels: Configure Alertmanager to send notifications to your preferred channels (e.g., Slack, email, PagerDuty).

Health Checks and Probes

The application deployment includes both liveness and readiness probes to ensure the container is healthy, responsive, and ready to serve traffic:

Liveness Probe

Checks if the application is running and responsive:

Endpoint: / (returns a 200 status code)
Initial delay: 10 seconds
Check interval: Every 10 seconds
Timeout: 5 seconds
Failure threshold: 3 consecutive failures

The liveness probe helps Kubernetes determine if the application is running correctly and restart the pod if necessary.

Readiness Probe

Checks if the application is ready to serve traffic:

Endpoint: / (returns a 200 status code)
Initial delay: 5 seconds
Check interval: Every 10 seconds
Timeout: 2 seconds
Success threshold: 1 successful check
Failure threshold: 3 consecutive failures

The readiness probe ensures that traffic is only sent to pods that are ready to handle requests. This is particularly useful during deployments and when the application needs time to initialize.

These probes help maintain the overall health and reliability of the application in the Kubernetes cluster.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
apps		apps
clusters/prod		clusters/prod
infrastructure		infrastructure
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitOps Infrastructure with FluxCD

Overview

Project Structure

Key Features

Deployment Strategy

Staging Environment

Production Environment

Installation and Setup

Prerequisites

Installing FluxCD

Usage

Monitoring and Troubleshooting

Configuration Details

Image Update Automation

Environment-Specific Configurations

Related Repositories

Application Repository

Monitoring and Alerting

Future Enhancements: Advanced Monitoring

Basic Monitoring Setup as Example

Alerting for Critical Issues

Health Checks and Probes

Liveness Probe

Readiness Probe

About

Releases

Packages

Contributors 2

Languages

barissekerciler/gls-fleet-infra

Folders and files

Latest commit

History

Repository files navigation

GitOps Infrastructure with FluxCD

Overview

Project Structure

Key Features

Deployment Strategy

Staging Environment

Production Environment

Installation and Setup

Prerequisites

Installing FluxCD

Usage

Monitoring and Troubleshooting

Configuration Details

Image Update Automation

Environment-Specific Configurations

Related Repositories

Application Repository

Monitoring and Alerting

Future Enhancements: Advanced Monitoring

Basic Monitoring Setup as Example

Alerting for Critical Issues

Health Checks and Probes

Liveness Probe

Readiness Probe

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages