Skip to content

Latest commit

 

History

History

Containerisation

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Best Practices for Containerization

Table of Contents

  1. Introduction
  2. Why Containerize?
  3. Containers vs. Virtual Machines
  4. Why Kubernetes?
  5. Tools
  6. Repository Structure
  7. References

1. Introduction:

Before discussing the benefits of containerization, it might be important to understand one of a key foundation for modern, cloud-native applications—the 12-Factor App.

The 12-Factor App

The 12-Factor App is a set of best practices for building applications that are portable, scalable, and resilient. These principles guide the development of software-as-a-service applications that are easy to deploy and maintain. Key tenets include:

  • Declarative Configuration: Keeping configuration in the environment rather than in code.
  • Stateless Processes: Ensuring that each instance of an application does not retain state, enabling seamless horizontal scaling and fault isolation.
  • Port Binding: Making the application self-contained and network-ready by binding directly to a port.
  • Explicit Dependencies: Declaring all external libraries and packages explicitly, ensuring that every deployment uses the required versions without relying on system-level packages.
  • Single Codebase: Maintaining a single repository for the entire application, simplifying version management and deployment across multiple environments.

Adhering to these principles bridges the gap between development and production environments, reducing deployment issues and scaling challenges.


2. Why Containerize?

Containerization packages an application along with all its dependencies into a lightweight, self-contained unit. This approach brings several advantages that complement the 12-Factor principles.

2.1 Declarative & Automated Setup

By defining environments and configurations declaratively (e.g., Dockerfiles, Helm charts), containerization standardizes the setup process. This minimizes onboarding time and ensures reproducibility across different environments.
No more "it works on my computer."

2.2 Clean Contract with the Operating System

Standardized Environments

Containers encapsulate applications and their dependencies while relying on a shared host kernel. This ensures portability across diverse execution environments without infrastructure-specific configurations.

Automation-Friendly

Declarative configurations enable seamless integration into CI/CD pipelines, automating builds, tests, and deployments to accelerate release cycles.

2.3 It is Lightweight

No Virtualization Overhead

Unlike traditional virtual machines, containers share the host’s kernel and use lightweight mechanisms—such as cgroups for resource limits. This results in minimal startup times and lower resource overhead while maintaining strong process isolation.

2.4 Consistency

Minimized Environment Divergence

By bundling dependencies with the application, containers ensure consistency between development, testing, and production environments. This reduces bugs caused by discrepancies between environments.

Ephemeral and Immutable

Containers are designed to be ephemeral; each instance starts fresh with no residual state from previous runs. This immutability guarantees clean deployments and simplifies debugging.

2.5 Scalability

Effortless Horizontal Scaling

Containers are stateless and lightweight, making them ideal for scaling workloads. Orchestration tools such as Kubernetes facilitate automatic replication of containers to handle fluctuating traffic.

Instead of upgrading a single large server (vertical scaling), containerized applications scale by increasing the number of running instances (horizontal scaling). This ensures:

  • No single point of failure—if one container crashes, others continue processing requests.
  • On-demand scaling—applications dynamically scale up or down based on real-time load.
  • Efficient resource utilization—only necessary instances run at any given time.

Orchestration: The Magic Behind Scaling

Scaling manually is inefficient. Modern orchestration platforms automate scaling based on traffic, resource utilization, and predefined policies.

Key benefits include:

  • Auto-scaling: Adjusts the number of running containers dynamically.
  • Self-healing: Automatically restarts failed containers.
  • Load balancing: Distributes traffic evenly across instances.

These features enable applications to handle varying workloads efficiently without requiring manual intervention.


3. Containers vs. Virtual Machines

It is important to distinguish between containers and virtual machines:

  • Containers run directly on the host OS using features like cgroups (for resource isolation) and namespaces (for process isolation), avoiding the overhead of full virtualization.
  • While containers share the host OS kernel, they isolate everything else—such as file systems, libraries, and runtime dependencies—ensuring security and stability without duplicating entire operating systems.

Key Differences

Feature Virtual Machines Containers
OS Each VM has its own OS kernel Containers share the host OS kernel
Startup Time Minutes Seconds (or less)
Overhead High (due to full OS) Low (lightweight processes)
Isolation Strong (full OS per VM) Process-level isolation
Resource Efficiency Lower (more duplication) Higher (shared OS, less overhead)

Virtual machines operate like separate apartments with individual utilities, while containers resemble rooms within a shared house—isolated but with significantly reduced overhead.


4. Why Kubernetes?

Kubernetes is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. It abstracts away the complexity of infrastructure management, enabling teams to focus on building and running applications efficiently.

4.1 What Is Kubernetes?

At its core, Kubernetes provides a framework to run distributed systems resiliently. It handles tasks such as scaling, failover, deployment rollouts, and service discovery, ensuring applications remain highly available and performant. Kubernetes operates on a cluster of machines (nodes), managing containers grouped into pods—the smallest deployable units in Kubernetes. Key features include:

  • Self-healing: Automatically restarts failed containers, replaces unresponsive pods, and reschedules workloads when nodes go offline.
  • Declarative Configuration: Applications are defined through YAML or JSON manifests, enabling version-controlled infrastructure-as-code.
  • Service Discovery & Load Balancing: Kubernetes automatically exposes applications via DNS or IPs and balances traffic across healthy pods.
  • Multi-cloud Support: Kubernetes provides a consistent deployment experience across on-premises data centers and public clouds.

4.2 What’s the Cost?

While Kubernetes offers tremendous benefits, it comes with some upfront costs that organization must consider:

Learning Cost

Kubernetes introduces a steep learning curve due to its extensive ecosystem and concepts such as pods, services, deployments, ingress controllers, and networking policies. Teams must invest time in mastering these components as well as tools like Helm (for package management) and kubectl (for cluster interaction).

Project Setup Overhead

Setting up a project in Kubernetes involves creating configuration files (manifests) or Helm charts to define application deployments, services, secrets, and other resources. While this adds initial complexity, these configurations are reusable and modular. Once established, they significantly streamline future deployments and updates.

Operational Complexity

Managing a Kubernetes cluster requires additional expertise in areas such as monitoring (e.g., Prometheus), logging (e.g., Fluentd, OpenTelemetry), security policies, and resource optimization.

4.3 Why Use Kubernetes Then?

Despite the costs, the advantages of Kubernetes far outweigh its challenges:

Easy Scaling

Kubernetes makes horizontal scaling effortless by allowing you to add or remove container instances dynamically based on traffic or resource utilization. With auto-scaling features like the Horizontal Pod Autoscaler (HPA), applications can handle sudden spikes in demand without manual intervention.

Efficient Resource Utilization

Kubernetes optimizes resource allocation by scheduling workloads intelligently across nodes based on CPU and memory requirements. This ensures that hardware is used efficiently while avoiding over-provisioning or underutilization.

Reduced Dependence on System Administrators

By automating routine tasks such as rollouts, rollbacks, health checks, and failover mechanisms, Kubernetes reduces reliance on system administrators for day-to-day operations. Developers can deploy applications independently using declarative configurations without needing deep infrastructure knowledge.

Multi-Application Management

Kubernetes excels at managing multiple applications within a single cluster. Whether running microservices architectures or monolithic applications side-by-side, Kubernetes isolates workloads while maintaining centralized control over resources.

Ultimately, Kubernetes excels in environments where scalability, portability, and operational consistency are critical.


5. Tools

The container ecosystem is rich with tools designed to simplify development, deployment, debugging, and monitoring. While many options exist, this is a curated list of tools that stand out for their practicality and reliability in containerized workflows.

5.1 Skaffold

Skaffold is a container-first tool that automates and streamlines the process of building, tagging, and pushing container images. It eliminates the need to write complex scripts for container image management, ensuring consistency across development and deployment environments.

Key Features

  • Container Build Automation: Supports multiple builders, including Docker, Buildpacks, Kaniko, Jib, and Bazel, allowing teams to choose the most efficient option.
  • Fast Iteration: Detects code changes and automatically rebuilds and redeploys containers, minimizing downtime during development.
  • Multi-Stage & Remote Builds: Enables secure and efficient builds, including cloud-based builds that avoid local resource constraints.
  • Pluggable CI/CD Integration: Works with GitOps workflows, allowing container images to be automatically pushed and versioned.

By automating the container build and deployment cycle, Skaffold eliminates repetitive tasks, enabling developers to focus on writing code rather than managing images manually.

5.2 Helm

Helm is the package manager for Kubernetes, providing a structured way to define, install, and manage applications in a cluster. It simplifies the deployment of complex applications by packaging them as Helm charts—reusable, versioned application definitions.

Key Features

  • Templating Engine: Uses YAML-based templates with values injection, ensuring flexibility across different environments.
  • Version Control: Manages application versions and rollbacks, making it easy to revert to a previous state if needed.
  • Dependency Management: Allows applications to define dependencies and automatically fetch required services.
  • Chart Repositories: Enables sharing and distribution of Helm charts via public or private repositories.

With Helm, deploying applications becomes declarative and reproducible, reducing configuration drift and manual setup overhead.

5.3 Telepresence

Telepresence bridges the gap between local development environments and remote Kubernetes clusters, allowing developers to work with cloud-based services while writing and testing code locally.

Key Features

  • Real-time Development: Runs local services as if they were deployed in the cluster, enabling seamless interaction with cloud-based APIs and microservices.
  • Intercept Mode: Redirects traffic from the cluster to a local process, eliminating the need for full redeployments during debugging.
  • Hybrid Development: Mixes locally running services with cloud-hosted components, reducing latency and improving the development experience.
  • Security & Compliance: Avoids the need for direct Kubernetes cluster access, making it safer to develop against production-like environments.

By using Telepresence, teams can debug services in real-time without disrupting existing deployments, speeding up iteration cycles.

5.4 Stern

Stern is a powerful log aggregation tool for Kubernetes, designed to make it easier to inspect logs from multiple pods simultaneously. It enhances observability, especially in distributed microservices architectures.

Key Features

  • Multi-Pod Log Streaming: Tails logs from multiple pods in real-time, filtering based on labels or names.
  • Color-Coded Output: Differentiates logs from various pods and containers for better readability.
  • Flexible Filtering: Supports regex-based filtering to display only relevant log messages.
  • Compatibility: Works seamlessly with Kubernetes and can be integrated into debugging workflows.

Stern provides a real-time view of application behavior, making it indispensable for troubleshooting and monitoring large-scale deployments.


6. Repository Structure

A well-organized repository structure ensures that your containerized application can be built, tested, and deployed consistently across various environments. An increasingly popular best practice is splitting your Helm charts into a separate public repository and placing environment-specific configurations (including overrides and sensitive data) in a private configuration repository. This approach leverages hierarchical Helm configurations, allowing you to manage multiple deployments from a single source of truth while keeping sensitive details out of public view.

6.1 Public Helm Chart Repository

This repository contains the minimal Helm chart needed to deploy your application. It includes:

  • Chart Metadata (Chart.yaml): Name, version, description, and dependencies of your chart.
  • Default Values (values.yaml): A basic configuration that works out of the box, suitable for demo or simple development setups.
  • Templates (in a templates folder): Kubernetes manifests that define your application deployments, services, config maps, etc.

Because this repo is publicly accessible, it must not contain sensitive information. Instead, any credentials or secret keys should be handled in your private configuration repo.

Example Structure:

mychart-repo/ 
├── Chart.yaml 
├── values.yaml 
├── templates/ 
│ ├── deployment.yaml 
│ ├── service.yaml 
│ └── ... 
├── charts/ 
│ └── optional-subchart/ 
│ ├───── Chart.yaml 
│ ├───── values.yaml 
│ ├───── templates/ 
└── README.md

6.2 Private Configuration Repository

Your configuration repository stores environment-specific overrides and sensitive data that should never be committed to a public repo. This includes:

  • Environment Overrides (e.g., dev, staging, prod)
    Each environment has its own values.yaml or multiple override files referencing the public chart. Here, you define resource limits, replica counts, domain names, and other environment-specific settings.
  • Secrets & Credentials
    Keep secret data out of source control via methods like Sealed Secrets or external secret management services (e.g., HashiCorp Vault). If necessary, store encrypted secrets or secure them in a private Git repository with restricted access.

Because Helm supports merging multiple values files, you can stack environment overrides easily. For example, you might have a values-dev.yaml file and a secrets-dev.yaml file for development, while production uses values-prod.yaml and secrets-prod.yaml.

Example Structure:

config-repo/ 
├── overlays/ 
│ ├── dev/ 
│ │ ├── values-dev.yaml 
│ │ └── secrets-dev.yaml 
│ ├── staging/ 
│ │ ├── values-staging.yaml 
│ │ └── secrets-staging.yaml 
│ └── prod/ 
│ │ ├── values-prod.yaml 
│ │ └── secrets-prod.yaml 
├── common.yaml
└── ...

6.3 Putting It All Together

  1. Store Your Helm Chart Publicly
    Publish the minimal, non-sensitive chart in the public repository. This chart defines the core logic (e.g., deployments, services) and a default set of values. Any subcharts you need can live in the charts/ directory of this repo.

  2. Maintain Sensitive Configurations Privately
    Use a separate, private repository for environment overrides and secrets. Helm’s hierarchical values approach allows you to combine multiple YAML files at deployment time.

  3. Deploy Using Layered Values
    When deploying to a specific environment (e.g., dev), you reference both the public chart and your private overrides. A typical Helm command might look like:

    helm install myapp \
      ./mychart-repo \
      -f ./config-repo/common.yaml \
      -f ./config-repo/overlays/dev/values-dev.yaml \
      -f ./config-repo/overlays/dev/secrets-dev.yaml \
      --namespace dev
    

7. References

  1. Adam Wiggins, The Twelve-Factor App, 2011. Available at: https://12factor.net
  2. Brendan Burns and Joe Beda, Kubernetes Up & Running, O'Reilly Media, 2019.
  3. Docker Inc., Docker Overview, Available at: https://www.docker.com/
  4. Kubernetes Documentation, Concepts Overview, Available at: https://kubernetes.io/docs/concepts/
  5. Google Site Reliability Engineering, The Case for Containers, Available at: https://sre.google/sre-book/
  6. NIST, Application Container Security Guide, Special Publication 800-190, Available at: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-190.pdf
  7. Skaffold Documentation, Available at: https://skaffold.dev/docs/
  8. Helm Documentation, Available at: https://helm.sh/docs/
  9. Telepresence Documentation, Available at: https://www.telepresence.io/docs/quick-start
  10. Stern GitHub Repository, Available at: https://github.com/stern/stern