-
Notifications
You must be signed in to change notification settings - Fork 100
Monitoring a cluster using round robin DNS
Prometheus PVE exporter is designed to return identical metrics no matter which node in a cluster is scraped. Thus for simple setups it is enough to simply scrape one cluster node. However, in production deployments it is desirable to have metrics available even when a cluster node is down.
The simplest way to implement a fallback mechanism for when Prometheus PVE exporter or a whole cluster node is down is to implement round-robin DNS.
We assume the following cluster configuration with three PVE nodes:
- pve-a.example.org:
2001:db8::a
- pve-b.example.org:
2001:db8::b
- pve-c.example.org:
2001:db8::c
In order to implement round-robin DNS it is necessary to configure an additional DNS records for each PVE node with a common label (assume this is in zone example.org
):
pve 300 IN AAAA 2001:db8::a
pve 300 IN AAAA 2001:db8::b
pve 300 IN AAAA 2001:db8::c
Assuming that Prometheus PVE exporter is running on every node, the targets
parameter of the PVE job can now be set to pve.example.org
:
scrape_configs:
- job_name: 'pve'
static_configs:
- targets:
- pve.example.org:9221
metrics_path: /pve
params:
module: [default]
Whenever prometheus tries to scrape a node which is not available, it will retry with another IP from the pve.example.org
record after a short timeout.