Skip to content

Latest commit

 

History

History
548 lines (416 loc) · 24.4 KB

README.md

File metadata and controls

548 lines (416 loc) · 24.4 KB

financial-data-api · ci ci

This repo is a demo project for dockerized flask applications (REST API). This simplified API exposes GET endpoints that allow you to pull stock prices and trading indicators. What is covered in this repo:

Application code:

  • Github Actions CICD:
    • Static analysis: flake8, pydocstyle
    • Image misconfiguration/vulnerabilities (Trivy), passing artifacts between jobs
    • Testing patterns with Pytests (unit / integration)
    • Docker image multi-stage build and distribution pattern
  • Docker PostgreSQL DB setup for local testing
  • Services configuration with Docker Compose
  • Makefile template
  • Flask blueprints
  • Flask-SQLAlchemy implementation
  • Nginx (reverse proxy) and Gunicorn (WSGI) implementation
  • Dependency injection

Infrastructure code:

  • Multi AZ serverless architecture:
    • AWS Organizations (multi-account strategy for dev & prod) with SCPs
    • VPC, Security-groups
    • RDS DB, S3, Route53, ALB, API Gateway, AWS Private link
    • IAM configuration (RBAC)
    • AWS Secrets Manager
    • ECS with Fargate (Blue/Green deployment)
  • Github Actions CICD:
    • Security scanner (tfsec)
    • Static analysis to enforce best practices (tflint, validate, fmt)
    • Automated infrastructure cost estimation (with Infracost)
  • Terragrunt patterns to keep the code DRY across environments
  • Automated architecture diagrams from Terraform code
  • Terraform remote backend bootstrap

1 - Architecture

A - App CICD architecture

B - Cloud architecture (AWS)

(image drawn on Cloudcraft)

Basic 3-tier application:

  • Application layer
  • Business logic layer
  • Data access layer

2 - Prerequisites

The API doesn't require python installed on your machine.

3 - Quickstart

A - Run local stack

Run the following commands to:

  • Build the Docker images
  • Run the Nginx, Localstack, App server and PotsgreSQL DB locally
  • Populate the db credentials secret in AWS Secrets Manager (localstack)
  • Populate DB with TSLA and AMZN stock prices
cd app & make build-app build-nginx up

Verify the API is running:

curl -I http://localhost/_healthcheck

Get resampled data

$ curl -G -d 'interval=1' -d 'frequency=Annual' http://localhost/stocks/time-series/AMZN
[
  {
    "close": 92.392,
    "high": 101.79,
    "low": 84.253,
    "open": 95.455,
    "period_start": "2019-01-01",
    "symbol": "AMZN",
    "volume": 8154332000
  },
  {
    "close": 162.8465,
    "high": 177.6125,
    "low": 81.3015,
    "open": 93.75,
    "period_start": "2020-01-01",
    "symbol": "AMZN",
    "volume": 24950814000
  },
  {
    "close": 166.717,
    "high": 188.654,
    "low": 144.05,
    "open": 163.5,
    "period_start": "2021-01-01",
    "symbol": "AMZN",
    "volume": 17076362000
  },
  {
    "close": 116.46,
    "high": 171.4,
    "low": 101.26,
    "open": 167.55,
    "period_start": "2022-01-01",
    "symbol": "AMZN",
    "volume": 10032250600
  }
]

B - Deploy the infrastructure on AWS

A step by step guide to financial-data-api IaC is accessible in infrastructure/README.md

4 - Project file structure

The best practice is for infrastructure and application code to sit in different repos, however I wanted to make this demo project self-contained.

.
├── .github
│   ├── workflow
│   │   │── app_code_cicd.yml
│   │   └── infra_code_cicd.yml
├── app
├── docs
├── infrastructure
├── .gitignore
├── Makefile
├── README.md

In ./app

.
├── config
│   ├── dev
│   │   └── config.yaml
│   ├── local
│   │   └── config.yaml
│   ├── prod
│   │   └── config.yaml
│   ├── test
│   │    └── config.yaml
│   └── gunicorn.py
├── docker
│   ├── app
│   │   └── Dockerfile
│   ├── nginx
│   │   ├── Dockerfile
│   │   ├── nginx.ecs.conf
│   │   └── nginx.local.conf
│   └── docker-compose.yaml
├── src
│   ├── __init__.py
│   ├── app.py
│   ├── blueprints
│   │   ├── healthcheck.py
│   │   └── stocks.py
│   ├── helpers.py
│   └── models.py
├── tests
│   ├── __init__.py
│   ├── conftest.py
│   ├── integration
│   │   ├── test_data
│   │   │   └── stocks_ohlcv.csv
│   │   ├── __init__.py
│   │   ├── test_app.py
│   │   └── test_stocks.py
│   └── unit
│       ├── __init__.py
│       └── test_helpers.py
├── .dockerignore
├── .yamllint
├── Makefile
├── requirements.in
├── requirements.txt

In ./infrastructure

.
├── aws-organizations
│   ├── templates
│   │   ├── dynamodb.yml
│   │   └── s3.yml
│   ├── organization-tasks.yml
│   └── organization.yml
├── terraform
│   ├── live
│   │   ├── _envcommon
│   │   │   └── <resource>.hcl
│   │   ├── <environment>
│   │   │   ├── env.hcl
│   │   │   └── <resource>
│   │   │       ├── main.tf
│   │   │       ├── README.md
│   │   │       └── terragrunt.hcl
│   │   ├── .tflint.hcl
|   │   └── infracost.yml
|   └── modules
├── Makefile
└── README.md

aws-organizations is the directory that contains the account baseline and terraform is used to define workload.

Account baseline contains resources which aren’t directly related to the workload but help with risk reduction, security, compliance and bootstrapping. A baseline may include cross account CloudTrail logging, GuardDuty detectors and similar services. On the other hand a workload is a collection of resources and code that delivers business value, such as a customer-facing application or a backend process.

<resource> can be "vpc" or "security-groups" for instance.

live and modules folders should sit in 2 separate git repos where live contains the currently deployed infratructure whilst modules should contain user defined modules. In this repo I only reuse existing terraform modules so live and modules folders are just placeholders. The idea behind having live vs modules git repos is to make sure you can point at a versioned module in dev/stage/prod and reduce the risk of impacting prod. Note that for simplicity only dev is implemented in this demo

5 - Gitops

A - App CICD workflow

  • yamllint: Lints yaml files in the repo
  • flake8: Lints .py files in the repo
  • pydocstyle: Checks compliance with Python docstring conventions
  • safety: python packages vulnerabilities scanner
  • image-misconfiguration: Detect configuration issues in app Dockerfile (Trivy)
  • build: Build app Docker image and push it to the pipeline artifacts
  • image-vulnerabilities: App image vulnerablities scanner (Trivy)
  • unit-tests: Test the smallest piece of code(functions) that can be isolated
  • integration-tests: Series of tests which call the API
  • push-app-image-to-registry: Push the application server Docker image to Docker Hub
  • push-nginx-image-to-registry: Push the custom Nginx Docker image to Docker Hub

Note that the last job should be skipped when running the pipeline locally. This is ensured using if: ${{ !env.ACT }} in the push-to-registry jobs. Running this locally means there will be a conflicting image tag when the Github Actions CICD will try and run it a second time.

B - Infra CICD workflow

  • format: Check if all Terraform configuration files are in a canonical format
  • validate: Verify whether a configuration is syntactically valid and internally consistent
  • tflint:
    • Find possible errors (like illegal instance types)
    • Warn about deprecated syntax, unused declarations
    • Enforce best practices, naming conventions
  • tfsec: Static analysis of terraform templates to spot potential security issues
  • infracost: Infracost shows cloud cost estimates for Terraform

Example of infracost automated PR comment:

One best practice is to always deploy from a single branch to avoid conflicting deployments.

You can automatically generate the terragrunt README.md files using this:

cd infrastructure && make terraform-docs DIR_PATH=live/dev/s3/README.md

C - Running the CICD pipeline locally

Install act to run the jobs on your local machine.

Example:

make app-cicd  # Run the full app CICD pipeline without pushing to Docker Hub
infra-cicd  # Run the full infrastructure CICD pipeline without applying changes

These commands require secrets.txt with this content:

GITHUB_TOKEN=<YOUR_PAT_TOKEN>
DOCKERHUB_USERNAME=<YOUR_DOCKERHUB_USERNAME>
DOCKERHUB_TOKEN=<YOUR_DOCKERHUB_TOKEN>
INFRACOST_API_KEY=<YOUR_INFRACOST_API_KEY>

Optionally you could also run pipeline jobs using the Makefile directly.

Example:

make pydocstyle
make tests

Some jobs such as image-vulnerabilities can be run in isolation using the act -j <job-name> command (example -j image-vulnerabilities).

6 - Docker image build pattern

The requirements are:

  • A dev image should be pushed to Docker Hub everytime a git push is made. That allows end-to-end testing in dev environment. I chose Docker Hub over AWS as Docker Hub is still the best choice for distributing software publicly.

  • Leverage pipeline artifacts to avoid rebuilding the image from scratch across jobs. Also pass image tag variables between jobs/steps using the output functionality to keep the code DRY.

  • The image tag should follow SemVer specifications which is MAJOR.MINOR.PATCH-<BRANCH NAME>.dev.<COMMIT SHA> for dev versions and MAJOR.MINOR.PATCH for production use.

A - SemVer2

Branch Commit # Image Version Image Tag
feature-1 1 1.0.0 1.0.0-feature-1.dev.b1d7ba7fa0c6a14041caaaf4025f6cebb924cb0f
feature-1 2 1.0.0 1.0.0-feature-1.dev.256e60e615e89332c5f602939463500c1be5d90a
main 5 1.0.0 1.0.0

The docker/metadata-action@v4 task can automate this but it requires using git tags which can be a bit cumbersome as it requires an update for each commit. So I preferred reimplementing something straightforward that uses the git branch name and commit SHA to form the image tag.

B - Version bump

Each PR should contain a new version of the APP_IMAGE_VERSION and NGINX_IMAGE_VERSION in .github/workflows/app_code_cicd.yml

7 - Testing framework

A - GIVEN-WHEN-THEN (Martin Fowler)

GIVEN - Describes the state of the world before you begin the behavior you're specifying in this scenario. You can think of it as the pre-conditions to the test.

WHEN - Behavior that you're specifying.

THEN - Changes you expect due to the specified behavior.

B - Four-Phase Test (Gerard Meszaros)

(image from Four-Phase Test)

For integration testing, the Setup phase consists in truncating and repopulating the market_data DB (cf db_fixture)

C - Debugging the code with VS Code remote-container extension

For debugging the code from within a Docker container you can use VS Code with the following config:

in .devcontainer/devcontainer.json

{
  "name": "Existing Dockerfile",
  "context": "../app",
  "dockerFile": "../app/docker/app/Dockerfile",

  "runArgs": [ "--network=host"],

  "remoteUser": "root",
  "remoteEnv": {
    "ENVIRONMENT": "test",
    "AWS_ACCESS_KEY_ID": "test",
    "AWS_SECRET_ACCESS_KEY": "test",
    "AWS_DEFAULT_REGION": "us-east-1"
  },
  "customizations": {
    "vscode": {
      "extensions": [
        "ms-python.python"
      ]
    }
  }
}

in .vscode/launch.json

{
    "version": "0.2.0",
    "configurations": [
        {
            // Testing extensions are very unstable in the remote-container extension
            // Hence it's preferable to run the tests from launch.json
            "name": "test_time_series",
            "type": "python",
            "request": "launch",
            "module": "pytest",
            "args": ["tests/integration/test_stocks.py::test_time_series"],
            "cwd": "${workspaceFolder}/app",
            "justMyCode": false, // Debug only user-written code
        }
    ]
}

In addition to this I have also written another documentation for remote-container extension that can be quite handy.

8 - Gunicorn application server and Nginx reverse proxy

From the Flask documentation:

"While lightweight and easy to use, Flask’s built-in server is not suitable for production as it doesn’t scale well."

Hence we need a more robust web-server than the flask web server, and the answer is: Gunicorn and Nginx. Gunicorn is a Python WSGI HTTP Server for UNIX. It's a pre-fork worker model. The Gunicorn server is broadly compatible with various web frameworks, simply implemented, light on server resources, and fairly speedy.

With Gunicorn as a web server our app is now more robust and scalable, however we need a way to balance the load to the Gunicorn workers, that's when Nginx is quite useful. Nginx is also a web server but more commonly used as a reverse proxy. Thereforce Gunicorn acts as an application server whilst Nginx behaves as a reverse proxy

The terminology is well defined in this article:

  • Flask is a web framework. It lets you build the core web application that powers the actual content on the site. It handles HTML rendering, authentication, administration, and backend logic.

  • Gunicorn is an application server. It translates HTTP requests into something Python can understand. Gunicorn implements the Web Server Gateway Interface (WSGI), which is a standard interface between web server software and web applications.

  • Nginx is a web server. It’s the public handler, more formally called the reverse proxy, for incoming requests and scales to thousands of simultaneous connections.

If you still struggle to understand what Nginx can achieve, check out this repo from AWS Labs.

Here’s a diagram illustrating how Nginx fits into a Flask web application:

(image from How to Configure NGINX for a Flask Web Application)

When deployed to AWS our app will look similar to the illustration below, with many servers, each running a Nginx web server and many Gunicorn workers

(image from A guide to deploying Machine/Deep Learning model(s) in Productionn)

9 - Deployment to AWS with Terraform

IMPORTANT: Following these instructions will deploy code into your AWS account. All of this qualifies for the AWS Free Tier, but if you've already used up your credits, running this code may cost you money. Also this repo is meant to be deployed to your sandbox environment.

Terraform is an infrastructure as code (IaC) tool that allows you to build, change, and version infrastructure safely and efficiently. This includes both low-level components like compute instances, storage, and networking, as well as high-level components like DNS entries and SaaS features. I you are new to Terraform I recommend you read this first A Comprehensive Guide to Terraform.

Also check why choosing Terraform over other configuration management and provisioning tools. TLDR; Terraform is an open source, cloud-agnostic provisioning tool that supports immutable infrastructure, a declarative language, and a client-only architecture.

A - Keep your code DRY with Terragrunt

Terragrunt is a thin wrapper for Terraform that provides extra tools for working with multiple Terraform modules. https://www.gruntwork.io

Sample for reference: https://github.com/gruntwork-io/terragrunt-infrastructure-live-example

Teragrunt generated files start with the prefix "terragrunt_" and are ignored in the .gitignore file to prevent them from being accidentally commmitted.

B - Best practices

I strongly recommend going through the terraform best practices before exploring this repo.

This hands-on complies with the 6 pillars for architecture solution:

Operational Excellence – The ability to run and monitor systems to deliver business value and to continually improve supporting processes and procedures.

Security – The ability to protect information, systems, and assets while delivering business value through risk assessments and mitigation strategies.

Reliability – The ability of a system to recover from infrastructure or service disruptions, dynamically acquire computing resources to meet demand, and mitigate disruptions such as misconfigurations or transient network issues.

Performance Efficiency – The ability to use computing resources efficiently to meet system requirements, and to maintain that efficiency as demand changes and technologies evolve.

Cost Optimization – The ability to run systems to deliver business value at the lowest price point.

These pillars aren't trade-off but they should be synergies, for example better sustainability means better performance efficiency and operational excellence means better cost optimization.

A useful tool when it comes to enforcing best practices across your cloud infrastruture is Amazon Trusted Advisor which has a free version in the Amazon Web Services Management Console. Only a limited version is available in the free-tier though.

A good architecture design can be facilitated by following these AWS General design principles:

  • Stop guessing your capacity needs
  • Test systems at production scale
  • Automate to make architectural experimentation easier
  • Allow for evolutionary architectures
  • Drive architectures using data
  • Improve through game days

The DevOps checklist:

10 - Improvements

Taking a Flask app from development to production is a demanding but rewarding process. There are a couple of areas that I have omitted but would need to be addressed in a real production environment such as:

  • Use AWS Private Links for private connectivity between AWS resources
  • User management and authentication for the backend API (AWS Cognito)
  • Adding monitoring/tracing tools (with Prometheus and Grafana for instance)
  • Protection from common web exploits (Web Application Firewall)
  • Network protections for all of your Amazon Virtual Private Clouds (VPCs) from layer 3 to layer 7 (AWS Network Firewall)
  • VPC interface endpoints to avoid exposing data to the internet (AWS PrivateLink)
  • ML powered anomaly detection in VPC flow logs / Cloudtrail logs / DNS logs / EKS audit logs (Amazon Guard Duty)
  • Storage autoscaling for the RDS DB
  • Automatically rotate the DB password with AWS Secrets Manager and AWS Lambda
  • AWS Control Tower (overkill for this demo, cf reddit thread)

11 - Useful resources