ML Performance Monitoring with Grafana

This repository contains a Dockerized service for serving a regression model and monitoring its performance in production. The core components include:

A Flask REST API that serves model predictions through endpoints.
A monitoring system that generates new sample data and logs data drift and concept drift metrics.
A Grafana dashboard for visualizing these metrics in real-time.
An alert pipeline that triggers notifications to a Discord channel whenever monitored metrics exceed a threshold value.
A containerized application (Docker and Docker Compose) for consistent deployment.

Docker Install (recommended)

Clone the repository

git clone https://github.com/PierreExeter/ML-Performance-Monitoring-with-Grafana.git

Build the image

docker compose build

Launch the Docker container

docker compose up -d

There should be 3 services running on these URLs:

Flask application: http://localhost:5000/metrics
Prometheus client: http://localhost:9090
Grafana server: http://localhost:3000/

Close the app

docker compose down

Local Install

Clone the repository

git clone https://github.com/PierreExeter/ML-Performance-Monitoring-with-Grafana.git

Install the dependencies

conda create -n grafana-env python=3.13 -y
conda activate grafana-env
pip install -U -r requirements.txt

Start the Grafana server

Install Grafana

sudo systemctl start grafana-server
sudo systemctl enable grafana-server

Open browser to http://localhost:3000/

Start the Prometheus instance

Download Prometheus
Unpack the downloaded tarball

tar xvfz prometheus-*.tar.gz

Start the Prometheus instance

cd prometheus-*
./prometheus --config.file=./prometheus.yml

Open browser to http://localhost:9090

Train the ML model

python src/train.py

Run the Flask application

python src/app.py

Open browser to http://localhost:5000/metrics. This should show a text file with the Prometheus scrape details.

Close the app

Stop Flask app with CTRL + C
Stop the Prometheus instance with CTRL + C
Stop the Grafana server :

sudo systemctl stop grafana-server

Grafana Dashboard Setup

Head to the Grafana UI page and log in (by default, username: admin, password : admin)
Connect to Prometheus

On the left side, go to Connections > Data Sources > Prometheus
Enter the Prometheus Server URL in the connection field : http://prometheus:9090
Click Save and Test

Create a Grafana Dashboard and visualisations

Click on the "+" > New dashboard
Click on "Add visualisation"
Select Metric "data_drift"
Click "run queries"
On the right hand panel, scroll down to 'Threshold' and add a threshold at 0.026.
select "show threshold" As lines (dashed)

Do the same for the concept_drift metric

Click on "Add visualisation"
Select Metric "concept_drift"
Click "run queries"
On the right hand panel, scroll down to Threshold and add a percentage threshold at 80%.
Select "show threshold" As lines (dashed)
Name your dashboard "ML-model-monitoring"
Click on "save dashboard"

Discord Alerts Setup

Set up a Discord alert that triggers notifications when the drift metrics exceed a threshold value.

Create a new Discord server

Log in to Discord
In the bottom left corner, click on "Create server"
Choose "Create my own" and "For me and my friends" options
Name the server as "Grafana alerts"

Create a new Discord channel and webhook

In the top left, click on the "+" (next to “Text channels")
Name the channel "grafana-alerts"
Click on the blue "Edit channel" button
Switch to the "Integrations" tab and click on "Create Webhook"
Click on the webhook and copy its URL

Create a contact point in Grafana that fires alerts to Discord

In the Grafana UI, go to Home > Alerting > Contact points
Click on "+ Add contact point"
Name it "Discord alerts"
Choose "Discord" as the integration type
Paste the Discord webhook URL
Click on "Test" to send a test alert to the Discord channel
Check that the alert was received by the Discord channel
Go back to Grafana and click on "Save contact point"

Create alerting rules for the data drift metric

Go to "Home > Dashboards > ML-model-monitoring"
Click "Edit" on the data drift visualisaton
Switch to the "Alert" tab and click on "New alert rule"
Name the new rule "Data Drift Detection Alert"
Under "2. Define query and alert condition", click on "Code" and then "Run queries". This will choose the "data_drift" metric as the first input to the alert.
Set alert condition : WHEN QUERY Is above 0.026. An alert will be fired when the drift score exceeds that amount.
Click "Preview alert rule condition"
Under "3. Add folder and labels", create a new folder called "data_drift_alerts"
Under "4. Set evaluation behavior", create a new evalutation group called evaluation-data-drift with a 1m interval
Set “Pending period" to "None". This causes alerts to fire immediately when the condition is met.
Under "5. Configure notifications", select "Discord alerts" as the contact point.
Under "6. Configure notification message", add a summary that will be displayed in the alert message (eg. "Your model is drifting !")
Click on “Save rule and exit".

Repeat the steps for the concept drift metric
Click on "save dashboard"

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
img		img
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
prometheus.yml		prometheus.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ML Performance Monitoring with Grafana

Docker Install (recommended)

Local Install

Grafana Dashboard Setup

Discord Alerts Setup

About

Uh oh!

Releases

Packages

Languages

License

PierreExeter/ML-Performance-Monitoring-with-Grafana

Folders and files

Latest commit

History

Repository files navigation

ML Performance Monitoring with Grafana

Docker Install (recommended)

Local Install

Grafana Dashboard Setup

Discord Alerts Setup

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages