A python based service that runs on a Nvidia GPU enabled machine, collects metrics and pushes to influx.
The following metrics are collected:
memory.used [MiB]
utilization.gpu [%]
temperature.gpu
power.draw [W]
You will need additional configuration on the host to enable nvidia-smi
access inside of the container. Follow this if you haven't.
- Clone this repository.
- Configure
.env
file in the root of this repo. - Build and run the docker container:
docker build -t gpu-metrics-collector-img .
docker run --name gpu-metrics-collector-cont -h=`hostname` --env-file=./.env --gpus=all gpu-metrics-collector-img
Spin up the entire stack using docker-compose:
docker-compose up -d
- Create a python virtual environment
python -m venv .venv source .venv/bin/activate
- Install dependencies
pip install -r requirements.txt
- Configure
.env
file. - Run the script
cd collector python main.py
All of the data can be visualized with Grafana.
The Grafana dashboard template is in grafana-template/template.json
. Upload this JSON to set up the dashboard. Note: you’ll need to manually add variable definitions for the dashboard buttons.
Steps:
- Add InfluxDB as a data source in Grafana:
- Select type as
InfluxDB
- Select Query language as
Flux
- Config URL. (Use
http://influxdb:8086
if deployed with docker-compose) - Config User/Pass/Org/Token as per your config.
- Config Default Bucket as
gpu-usage
. - Import
grafana-template/template.json
as a template to set up the dashboard.
This project is licensed under the MIT License - see the LICENSE file for details