In the dynamic field of artificial intelligence, the step of deploying trained models to make predictions on fresh data, known as inference, is crucial, particularly for real-time operations. Monitoring these inferences for accuracy, anomalies, and performance in real time is essential to ensure the AI system remains efficient and reliable. This ongoing surveillance provides valuable insights that can be acted upon promptly. To assist with this, visualization tools offer an intuitive graphical display of the AI's performance metrics, simplifying complex data into an accessible format for stakeholders to analyze and make informed decisions.
The strategy for developing such robust systems is designed to complement cloud deployments by emphasizing the versatility of containerized environments. Instead of replicating cloud services, the focus is on crafting a system that's inherently flexible and ready for deployment in any environment, whether locally during the development and testing stages or scaled up to the cloud when the situation demands.
Container technologies like Docker have become the cornerstone of this flexible approach, allowing the encapsulation of AI components, monitoring tools, and visualization dashboards within isolated containers. This modularity not only ensures consistency across various development stages but also guarantees that the system is scalable and portable, making it cloud-ready when the time comes.
Orchestration tools such as Kubernetes streamline this process by automating the deployment, scaling, and operation management of these containers. This enables a smooth transition between local and cloud environments, providing developers with a powerful and efficient toolset for AI application development that is both effective in a local setup and primed for cloud deployment.
The challenge is to develop a flexible and scalable artificial intelligence inference system using containerization that is optimized for real-time monitoring and visualization, and which can be efficiently deployed both in local environments during the development phase and seamlessly transitioned to cloud environments for broader scalability and distribution.
The challenge lies in developing a localized, containerized solution that:
- Runs AI Inference: Incorporates a standalone AI module adept at processing data and generating time-series results.
- Stores and Retrieves Results: An efficient mechanism to store time-series AI outputs and ensure swift data retrieval.
- Visualizes Data: A dynamic visualization tool that offers real-time insights into the AI's performance, aiding stakeholders in decision-making.
- Monitors System Health: A robust monitoring system that provides a holistic view of all components, from AI processing to data storage.
- Orchestrates Workloads: Utilizes orchestration tools to manage, scale, and automate tasks, ensuring the local environment closely simulates cloud deployments.
To design and implement a containerized AI inference system that enables real-time monitoring and visualization, ensuring it is optimized for both local and cloud environments. This system will be built for efficient scalability and portability to support seamless transitions from development to production, thus providing a robust solution for AI model deployment that meets the demands of diverse operational scenarios.
The solution should be deployable on local machines using containerization tools like Docker and orchestrated using platforms like Docker-Compose/Kubernetes. This setup aims to:
- Facilitate rapid prototyping and testing in a controlled environment.
- Simulate real-world cloud scenarios and workloads.
- Offer efficient debugging and troubleshooting capabilities.
- The AI module should remain modular and independent for straightforward updates and modifications.
- Data persistence must be ensured, even in the event of container failures.
- Real-time visualization capabilities should allow for specific time interval analyses.
- Comprehensive monitoring should cover all components, offering timely alerts for any anomalies.
This repository presents our proposed architecture designed to streamline monitoring and orchestration processes for containerized applications. The architecture is divided into four primary layers: Visualization, Databases, Modules, and Orchestration. It integrates powerful tools like Grafana, Prometheus, and InfluxDB for efficient data visualization and storage. On the module front, it incorporates cAdvisor, Node Exporter, and specialized AI inference modules for comprehensive data collection and processing. For orchestration, we propose a flexible approach, allowing users to choose between Docker Compose and Kubernetes, all running on the robust Ubuntu operating system. This architecture ensures efficient data flow, from raw metrics collection to insightful visualization, ensuring optimal performance and observability of your applications.
- Local Inference Script Update
- Live Inference Script Update - Fix Output stream Dimensions
- Influx DB setup
- Influx Python Local Inserter
- Live Count Update on Live RTSP
- Live Inference Pictures with Data Insert
- C++ Algorithm
- GPU Device Monitoring
- Live Inference Speed / Accuracies Monitoring
- Mojo vs Python Test
- Kubernetes Setup
- Alert Manager Setup
- Live Dashboard Update
- Grafana Live Streams (RTSP)
- Cloud Deployment - Integration with Azure and AWS
- ReadMe Documentation - Grafana/Influx/cAdvisor/NodeExporter/AlertManager
- ❗ Issues: After Kubectl installation, WIFI is removed on Host
- Docker
- Grafana
- Prometheus
- Kubernetes
Currently there is only a single Base Image. First, build Base image:
docker build -f ./build/base-env.dockerfile -t ai-stack-lite-base-1 .
After building the base image, build Run image:
docker build -f run-env.dockerfile -t ai-stack-lite-run-1 .
In order to simulate real world scenario, a Camera Stream is needed, in this case, MediaMTX is used to assist. Build MediaMTX image:
docker build -f ./mediamtx/emulator-env.dockerfile -t mediamtx-env-1 .
Start by creating env files in the build directory:
/build/build.env
Ensure the following variables are available in build.env:
MOJO_KEY=<Your Key>
For debugging purpose, you can run only Single Module interactively:
docker run -it --gpus all -t ai-stack-lite-base-1:latest
Using Docker-Compose:
docker-compose -f docker-compose.yml up
Following is provided from coco.names:
0-4: person, bicycle, car, motorbike, aeroplane
5-9: bus, train, truck, boat, traffic light
10-14: fire hydrant, stop sign, parking meter, bench, bird
15-19: cat, dog, horse, sheep, cow
20-24: elephant, bear, zebra, giraffe, backpack
25-29: umbrella, handbag, tie, suitcase, frisbee
30-34: skis, snowboard, sports ball, kite, baseball bat
35-39: baseball glove, skateboard, surfboard, tennis racket, bottle
40-44: wine glass, cup, fork, knife, spoon
45-49: bowl, banana, apple, sandwich, orange
50-54: broccoli, carrot, hot dog, pizza, donut
55-59: cake, chair, sofa, pottedplant, bed
60-64: diningtable, toilet, tvmonitor, laptop, mouse
65-69: remote, keyboard, cell phone, microwave, oven
70-74: toaster, sink, refrigerator, book, clock
75-79: vase, scissors, teddy bear, hair drier, toothbrush
Please use the right ID under "CLASS_IDS", order is displayed accordingly.
Following shows an example configuration for a Python Module
python-module-1:
image: ai-stack-lite-run-1:latest
ports:
- 8001:5000/tcp
environment:
- RUN_TYPE=python
- RUN_SCRIPT_PATH=apps/python/live-gpu-inference-traffic-mt.py
- MODEL_PATH=yolov8x.pt
- CAMERA_LOCATION=Townhall
- RTSP_INPUT=rtsp://emulator-module:8554/sample-1
- RTSP_OUTPUT=rtsp://emulator-module:8554/live-1
- CLASS_IDS=0,1,16,2,3,5,7
- INTEREST_LINE_COORDINATES=960,0
- TRAFFIC_LINE_COORDINATES=960,0
- SCALE_PERCENT=50
- DEFAULT_LINE_SIZE=2
- DEFAULT_FONT_SCALE=1
- DEFAULT_OFFSET=2
# Deploy on GPU
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
TBA
Ensure to have minikube installed:
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube
minikube start
Install Kubectl via - Kubectl Link
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
kubectl version --client
The emulator module contains 5 main streams, 2 replays sample footage (MOT1608raw.mp4 and MOT1602raw.mp4) recursively, and 3 live stream path opening wait for publishing. This section mainly describes the visualization dashboard and AI Inference.
Output Grafana Visualization (Dummy Data):
The Side by Side outcome is shown as following (Left Stream Emulator Video, Right Inference Video):
Visualization Dashboard is as following:
- TBA
The Side by Side outcome is shown as following (Left Stream Emulator Video, Right Inference Video):
Visualization Dashboard is as following:
- TBA
The Side by Side outcome is shown as following (Left Stream Emulator Video, Right Inference Video):
Visualization Dashboard is as following:
- TBA
The python modules individually takes in the given $RTSP_INPUT and publish to $RTSP_OUTPUT based on given configurations:
As an example, you will see a similar input and output to the following:
Raw Video | Inferenced Video |
---|---|
Visualization Dashboard is as following:
- TBA
As an interesting experiment, let's use a public live camera stream from Sydney:
The Side by Side outcome is shown as following (Left Stream Emulator Video, Right Inference Video):
The uses of off-the-shelf modules (Grafana, Prometheus, node-exporter, cadvisor) and setup to monitor Host and docker environments:
Node Exporter |
---|
cAdvisor |
---|
Input Stream Specification:
- Frame Size
- FPS
Metrics | Python | C++ | Modular |
---|---|---|---|
RAM | |||
CPU | |||
GPU |
TBD
Following the links:
- Enhancing AI Development with Mojo: Code Examples and Best Practices
- Get started with Mojo🔥
- How to install Mojo🔥locally using Docker
- Object-tracking-and-counting-using-YOLOV8
- grafana-livecamera-rtsp-webrtc
- go2rtc
- Yolo V8 - Vehicles Detecting \ Counting
- Tracking and counting of object using YOLO v8
- Open CV RTSP camera buffer lag
- opencv read error:[h264 @ 0x8f915e0] error while decoding MB 53 20, bytestream -7
- Object Detection using YOLOv5 OpenCV DNN in C++ and Python
- How to install OpenCV 4.5.2 with CUDA 11.2 and CUDNN 8.2 in Ubuntu 20.04