Skip to content

To design and implement a containerized AI inference system that enables real-time monitoring and visualization, ensuring it is optimized for both local and cloud environments.

Notifications You must be signed in to change notification settings

gLuColte/AI-Stack-Lite

Repository files navigation

AI Stack Lige

Background

In the dynamic field of artificial intelligence, the step of deploying trained models to make predictions on fresh data, known as inference, is crucial, particularly for real-time operations. Monitoring these inferences for accuracy, anomalies, and performance in real time is essential to ensure the AI system remains efficient and reliable. This ongoing surveillance provides valuable insights that can be acted upon promptly. To assist with this, visualization tools offer an intuitive graphical display of the AI's performance metrics, simplifying complex data into an accessible format for stakeholders to analyze and make informed decisions.

The strategy for developing such robust systems is designed to complement cloud deployments by emphasizing the versatility of containerized environments. Instead of replicating cloud services, the focus is on crafting a system that's inherently flexible and ready for deployment in any environment, whether locally during the development and testing stages or scaled up to the cloud when the situation demands.

Container technologies like Docker have become the cornerstone of this flexible approach, allowing the encapsulation of AI components, monitoring tools, and visualization dashboards within isolated containers. This modularity not only ensures consistency across various development stages but also guarantees that the system is scalable and portable, making it cloud-ready when the time comes.

Orchestration tools such as Kubernetes streamline this process by automating the deployment, scaling, and operation management of these containers. This enables a smooth transition between local and cloud environments, providing developers with a powerful and efficient toolset for AI application development that is both effective in a local setup and primed for cloud deployment.

Problem Statement

The challenge is to develop a flexible and scalable artificial intelligence inference system using containerization that is optimized for real-time monitoring and visualization, and which can be efficiently deployed both in local environments during the development phase and seamlessly transitioned to cloud environments for broader scalability and distribution.

Problem

The challenge lies in developing a localized, containerized solution that:

  1. Runs AI Inference: Incorporates a standalone AI module adept at processing data and generating time-series results.
  2. Stores and Retrieves Results: An efficient mechanism to store time-series AI outputs and ensure swift data retrieval.
  3. Visualizes Data: A dynamic visualization tool that offers real-time insights into the AI's performance, aiding stakeholders in decision-making.
  4. Monitors System Health: A robust monitoring system that provides a holistic view of all components, from AI processing to data storage.
  5. Orchestrates Workloads: Utilizes orchestration tools to manage, scale, and automate tasks, ensuring the local environment closely simulates cloud deployments.

Objective

To design and implement a containerized AI inference system that enables real-time monitoring and visualization, ensuring it is optimized for both local and cloud environments. This system will be built for efficient scalability and portability to support seamless transitions from development to production, thus providing a robust solution for AI model deployment that meets the demands of diverse operational scenarios.

Local Simulation

The solution should be deployable on local machines using containerization tools like Docker and orchestrated using platforms like Docker-Compose/Kubernetes. This setup aims to:

  • Facilitate rapid prototyping and testing in a controlled environment.
  • Simulate real-world cloud scenarios and workloads.
  • Offer efficient debugging and troubleshooting capabilities.

Constraints

  • The AI module should remain modular and independent for straightforward updates and modifications.
  • Data persistence must be ensured, even in the event of container failures.
  • Real-time visualization capabilities should allow for specific time interval analyses.
  • Comprehensive monitoring should cover all components, offering timely alerts for any anomalies.

Architecture

This repository presents our proposed architecture designed to streamline monitoring and orchestration processes for containerized applications. The architecture is divided into four primary layers: Visualization, Databases, Modules, and Orchestration. It integrates powerful tools like Grafana, Prometheus, and InfluxDB for efficient data visualization and storage. On the module front, it incorporates cAdvisor, Node Exporter, and specialized AI inference modules for comprehensive data collection and processing. For orchestration, we propose a flexible approach, allowing users to choose between Docker Compose and Kubernetes, all running on the robust Ubuntu operating system. This architecture ensures efficient data flow, from raw metrics collection to insightful visualization, ensuring optimal performance and observability of your applications.

Architecture

TODOs

  • Local Inference Script Update
  • Live Inference Script Update - Fix Output stream Dimensions
  • Influx DB setup
  • Influx Python Local Inserter
  • Live Count Update on Live RTSP
  • Live Inference Pictures with Data Insert
  • C++ Algorithm
  • GPU Device Monitoring
  • Live Inference Speed / Accuracies Monitoring
  • Mojo vs Python Test
  • Kubernetes Setup
  • Alert Manager Setup
  • Live Dashboard Update
  • Grafana Live Streams (RTSP)
  • Cloud Deployment - Integration with Azure and AWS
  • ReadMe Documentation - Grafana/Influx/cAdvisor/NodeExporter/AlertManager
  • ❗ Issues: After Kubectl installation, WIFI is removed on Host

Learning Goals

  • Docker
  • Grafana
  • Prometheus
  • Kubernetes

Setup

Base Image Building

Currently there is only a single Base Image. First, build Base image:

docker build -f ./build/base-env.dockerfile -t ai-stack-lite-base-1 .

After building the base image, build Run image:

docker build -f run-env.dockerfile -t ai-stack-lite-run-1 .

In order to simulate real world scenario, a Camera Stream is needed, in this case, MediaMTX is used to assist. Build MediaMTX image:

docker build -f ./mediamtx/emulator-env.dockerfile -t mediamtx-env-1 .

Modular Token Key

Start by creating env files in the build directory:

/build/build.env

Ensure the following variables are available in build.env:

MOJO_KEY=<Your Key>

Execution - Docker/Docker-Compose

For debugging purpose, you can run only Single Module interactively:

docker run -it --gpus all -t ai-stack-lite-base-1:latest

Using Docker-Compose:

docker-compose -f docker-compose.yml up

YoloV8 Class Names

Following is provided from coco.names:

0-4: person, bicycle, car, motorbike, aeroplane
5-9: bus, train, truck, boat, traffic light
10-14: fire hydrant, stop sign, parking meter, bench, bird
15-19: cat, dog, horse, sheep, cow
20-24: elephant, bear, zebra, giraffe, backpack
25-29: umbrella, handbag, tie, suitcase, frisbee
30-34: skis, snowboard, sports ball, kite, baseball bat
35-39: baseball glove, skateboard, surfboard, tennis racket, bottle
40-44: wine glass, cup, fork, knife, spoon
45-49: bowl, banana, apple, sandwich, orange
50-54: broccoli, carrot, hot dog, pizza, donut
55-59: cake, chair, sofa, pottedplant, bed
60-64: diningtable, toilet, tvmonitor, laptop, mouse
65-69: remote, keyboard, cell phone, microwave, oven
70-74: toaster, sink, refrigerator, book, clock
75-79: vase, scissors, teddy bear, hair drier, toothbrush

Please use the right ID under "CLASS_IDS", order is displayed accordingly.

Sample AI Module - Python

Following shows an example configuration for a Python Module

python-module-1:
image: ai-stack-lite-run-1:latest
ports:
    - 8001:5000/tcp
environment:
    - RUN_TYPE=python
    - RUN_SCRIPT_PATH=apps/python/live-gpu-inference-traffic-mt.py
    - MODEL_PATH=yolov8x.pt
    - CAMERA_LOCATION=Townhall
    - RTSP_INPUT=rtsp://emulator-module:8554/sample-1
    - RTSP_OUTPUT=rtsp://emulator-module:8554/live-1
    - CLASS_IDS=0,1,16,2,3,5,7
    - INTEREST_LINE_COORDINATES=960,0
    - TRAFFIC_LINE_COORDINATES=960,0
    - SCALE_PERCENT=50
    - DEFAULT_LINE_SIZE=2
    - DEFAULT_FONT_SCALE=1
    - DEFAULT_OFFSET=2
# Deploy on GPU
deploy:
    resources:
    reservations:
        devices:
        - driver: nvidia
            count: 1
            capabilities: [gpu]

Sample AI Module - C++

TBA

Execution - Kubernetes

Ensure to have minikube installed:

curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube
minikube start

Install Kubectl via - Kubectl Link

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
kubectl version --client

Visualization and AI Inference

The emulator module contains 5 main streams, 2 replays sample footage (MOT1608raw.mp4 and MOT1602raw.mp4) recursively, and 3 live stream path opening wait for publishing. This section mainly describes the visualization dashboard and AI Inference.

Dummy Data Visulization

Output Grafana Visualization (Dummy Data):

Sample Grafana Visualization

Traffic - Pedestrian and Vehicle Detection

The Side by Side outcome is shown as following (Left Stream Emulator Video, Right Inference Video): Pedestrian Traffic Counter

Visualization Dashboard is as following:

  • TBA

Traffic - Vehicle Detection

The Side by Side outcome is shown as following (Left Stream Emulator Video, Right Inference Video): Traffic Counter

Visualization Dashboard is as following:

  • TBA

Pedestrian - Person Detection

The Side by Side outcome is shown as following (Left Stream Emulator Video, Right Inference Video): Pedestrian Traffic Counter

Visualization Dashboard is as following:

  • TBA

Pedestrian - Key Point Detection

The python modules individually takes in the given $RTSP_INPUT and publish to $RTSP_OUTPUT based on given configurations:

As an example, you will see a similar input and output to the following:

Raw Video Inferenced Video
Raw Video Inferenced Video

Visualization Dashboard is as following:

  • TBA

Live Harbour Bridge

As an interesting experiment, let's use a public live camera stream from Sydney:

The Side by Side outcome is shown as following (Left Stream Emulator Video, Right Inference Video): Live Harbour Bridge Counter

Monitoring

The uses of off-the-shelf modules (Grafana, Prometheus, node-exporter, cadvisor) and setup to monitor Host and docker environments:

Node Exporter
Node Exporter
cAdvisor
cAdvisor

Performance Comparison

Input Stream Specification:

  • Frame Size
  • FPS
Metrics Python C++ Modular
RAM
CPU
GPU

Cloud Deployments

TBD

Reference Sites

Following the links:

About

To design and implement a containerized AI inference system that enables real-time monitoring and visualization, ensuring it is optimized for both local and cloud environments.

Resources

Stars

Watchers

Forks

Packages

No packages published