✨ Add: Prometheus Federation 🚧 #439

mrnicegyu11 · 2023-11-15T10:06:51Z

First try to add a federated scraping of metrics to prometheus.

This PR introduces a federated Prometheus setup designed to optimize metrics storage and retention. It features two Prometheus instances: one holding a small number of metrics with a long retention period for long-term data analysis, and the other maintaining a large volume of metrics with a short retention period for real-time monitoring and troubleshooting. This setup aims to balance resource usage and data availability, improving overall system performance.

See #422

Common Prometheus naming & retention scheme:

Only metrics starting with osaprc_ or s4l_ will be retained in the federated prometheus for long-term storage.

Renamed metrics/rules:

node_cpu_seconds_total-nonidle-increase-over-nodes-12weeks-v2 -> osparc_node_cpu_seconds_total-nonidle-increase-over-nodes-12weeks
osparc_metrics:cpu_usage_per_node_percentage -> osparc_cpu_usage_per_node_percentage
simcore_simcore_service_director_services_started_total-sum_by_key_tag -> osparc_director_services_started_total_sum_by_key_tag
simcore_simcore_service_webserver_services_started_total-sum_by_key_tag -> osparc_webserver_services_started_total_sum_by_key_tag

Added rules:

osparc_autoscaling_machines_active
osparc_autoscaling_machines_buffer
osparc_container_instances_s4lcorelite

Removed rules:

http_requests_total-rate-5min
container_tasks_state-count_by_image
node_cpu_seconds_total:nonidle_increase_over_nodes_12weeks
node_cpu_seconds_total-nonidle-sum_over_nodes

We will have to backfill the rules: https://jessicagreben.medium.com/prometheus-fill-in-data-for-new-recording-rules-30a14ccb8467

sanderegg

looking good. good luck!

services/monitoring/template.env

elisabettai

Looks good, thanks for the notification!

Can you notify me when the rule name change becomes effective, so I update the code in the new metrics repo?

kaiser added 3 commits November 14, 2023 15:18

Fix typo

197b3f9

Add federated prometheus, bump prometheus minor version

1f6ff31

Scale cadvisor resource monitoring prometheus to zero

503100b

mrnicegyu11 added t:enhancement New feature or request observability alerting/monitoring t:infra-ops Adjustments to the way or resources with that microservices are run labels Nov 15, 2023

mrnicegyu11 added this to the 7peaks milestone Nov 15, 2023

mrnicegyu11 self-assigned this Nov 15, 2023

mrnicegyu11 and others added 2 commits November 15, 2023 11:06

Merge branch 'main' into add/prometheusFederation

5a7c86d

minor fixes

085efab

mrnicegyu11 marked this pull request as ready for review November 15, 2023 16:06

mrnicegyu11 requested a review from YuryHrytsuk as a code owner November 15, 2023 16:06

mrnicegyu11 requested review from mguidon and sanderegg November 15, 2023 16:06

sanderegg approved these changes Nov 15, 2023

View reviewed changes

YuryHrytsuk approved these changes Nov 16, 2023

View reviewed changes

services/monitoring/template.env Show resolved Hide resolved

mrnicegyu11 and others added 4 commits November 16, 2023 15:51

Rename and purge prometheus rules, add common prefixes s4l_ and osparc_

5dfa686

Merge branch 'main' into add/prometheusFederation

b5e218b

fix typo

f5e67eb

fix typo

d461c00

mrnicegyu11 requested a review from elisabettai November 17, 2023 10:47

elisabettai approved these changes Nov 17, 2023

View reviewed changes

mrnicegyu11 changed the title ~~Add: Prometheus Federation~~ ✨ Add: Prometheus Federation 🚧 Nov 17, 2023

Merge branch 'main' into add/prometheusFederation

e4c25e3

mrnicegyu11 enabled auto-merge (squash) November 17, 2023 13:04

mrnicegyu11 disabled auto-merge November 17, 2023 13:04

mrnicegyu11 merged commit dae66c4 into ITISFoundation:main Nov 17, 2023
2 checks passed

mrnicegyu11 mentioned this pull request Nov 20, 2023

Add s4l prometheus monitoring #422

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

✨ Add: Prometheus Federation 🚧 #439

✨ Add: Prometheus Federation 🚧 #439

mrnicegyu11 commented Nov 15, 2023 •

edited

Loading

sanderegg left a comment

elisabettai left a comment

✨ Add: Prometheus Federation 🚧 #439

✨ Add: Prometheus Federation 🚧 #439

Conversation

mrnicegyu11 commented Nov 15, 2023 • edited Loading

Common Prometheus naming & retention scheme:

Renamed metrics/rules:

Added rules:

Removed rules:

sanderegg left a comment

Choose a reason for hiding this comment

elisabettai left a comment

Choose a reason for hiding this comment

mrnicegyu11 commented Nov 15, 2023 •

edited

Loading