Example Prometheus Monitoring

Goal

Setup monitoring with Prometheus and Grafana.

Steps

Run sample server: npm install and node server
Run Prometheus: see below
Visit your running Prometheus and run queries
Run Grafana: see below
Add Prometheus data source (Url: http://localhost:9090, Access: direct)
Import grafana-dashboard.json dashboard
Create your own dashboard from the Prometheus queries

Requirements

Docker

Run

Modify: /prometheus-data/prometheus.yml, replace 192.168.0.10 with your own host machine's IP.
Host machine IP address: ifconfig | grep 'inet 192'| awk '{ print $2}'

docker run -p 9090:9090 -v "$(pwd)/prometheus-data":/prometheus-data prom/prometheus -config.file=/prometheus-data/prometheus.yml

Open Prometheus: http://localhost:9090

Example Queries

Throughput

Error rate

Range[0,1]: number of 5xx requests / total number of requests

sum(increase(http_request_duration_ms_count{code=~"^5..$"}[1m])) /  sum(increase(http_request_duration_ms_count[1m]))

Request Per Minute

sum(rate(http_request_duration_ms_count[1m])) by (service, route, method, code)  * 60

Response Time

Apdex

Apdex score approximation:
100ms target and 300ms tolerated response time

(
  sum(rate(http_request_duration_ms_bucket{le="100"}[1m])) by (service)
+
  sum(rate(http_request_duration_ms_bucket{le="300"}[1m])) by (service)
) / 2 / sum(rate(http_request_duration_ms_count[1m])) by (service)

Note that we divide the sum of both buckets. The reason is that the histogram buckets are cumulative. The le="100" bucket is also contained in the le="300" bucket; dividing it by 2 corrects for that. - Prometheus docs

95th Response Time

histogram_quantile(0.95, sum(rate(http_request_duration_ms_bucket[1m])) by (le, service, route, method))

Median Response Time:

histogram_quantile(0.5, sum(rate(http_request_duration_ms_bucket[1m])) by (le, service, route, method))

Average Response Time

avg(rate(http_request_duration_ms_sum[1m]) / rate(http_request_duration_ms_count[1m])) by (service, route, method, code)

Memory Usage

Average Memory Usage

In Megabyte.

avg(nodejs_external_memory_bytes / 1024) by (service)

Reload config

Necessary when you modified prometheus-data.

curl -X POST http://localhost:9090/-/reload

Prometheus Data

avg(rate(http_request_duration_ms_sum[1m]) / rate(http_request_duration_ms_count[1m])) by (service, route, method, code)

Prometheus Alerts

States of active alerts: pending, firing

Grafana

Run

docker run -i -p 3000:3000 grafana/grafana

Open Grafana: http://localhost:3000

Username: admin
Password: admin

Grafana Dashboard to import: /grafana-dashboard.json

Grafana Dashboard

Acknowledgements

This example is sponsored by Trace by RisingStack.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Example Prometheus Monitoring

Goal

Steps

Requirements

Run

Example Queries

Throughput

Error rate

Request Per Minute

Response Time

Apdex

95th Response Time

Median Response Time:

Average Response Time

Memory Usage

Average Memory Usage

Reload config

Prometheus Data

Prometheus Alerts

Grafana

Run

Grafana Dashboard

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

Example Prometheus Monitoring

Goal

Steps

Requirements

Run

Example Queries

Throughput

Error rate

Request Per Minute

Response Time

Apdex

95th Response Time

Median Response Time:

Average Response Time

Memory Usage

Average Memory Usage

Reload config

Prometheus Data

Prometheus Alerts

Grafana

Run

Grafana Dashboard

Acknowledgements