You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently tracking the state of Rollout analyses and outcomes over time requires complex PromQL queries and recording rules due to the gauge-based nature of the existing metrics (rollout_phase and analysis_run_phase). This makes it difficult to:
Track the latest analysis state for a Rollout
Count how many times a specific Rollout has been rolled back or progressed after an analysis run
Understand the historical progression of analysis runs
Calculate key metrics like Change Failure Rate (CFR) for services using Rollouts
There are a couple of options that I could think of:
Add "latest" and "rollout" labels to existing analysis_run_* gauge metrics:
Are there other options that folks can think of here?
Use Cases
My organization manages numerous Kubernetes services and is evaluating Argo Rollouts for progressive delivery. These enhanced metrics would:
Enable tracking of deployment success rates per service
Allow calculation of key reliability metrics (e.g., Change Failure Rate)
Provide historical insights into rollout patterns and failure modes
Simplify integration with existing monitoring and alerting systems
The current metrics require complex PromQL manipulations that are both fragile and potentially unreliable for these use cases. These enhancements would make it significantly easier to monitor and analyze rollout behavior at scale.
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
The text was updated successfully, but these errors were encountered:
jahvon
changed the title
Add time series metrics for tracking Rollout analysis states
Enhance Rollout controller metrics to better track analysis states and outcomes over time
Dec 17, 2024
Summary
Currently tracking the state of Rollout analyses and outcomes over time requires complex PromQL queries and recording rules due to the gauge-based nature of the existing metrics (
rollout_phase
andanalysis_run_phase
). This makes it difficult to:There are a couple of options that I could think of:
analysis_run_*
gauge metrics:This enables easier identification of current state without adding new metric (although it would increase cardinality a bit)
Are there other options that folks can think of here?
Use Cases
My organization manages numerous Kubernetes services and is evaluating Argo Rollouts for progressive delivery. These enhanced metrics would:
The current metrics require complex PromQL manipulations that are both fragile and potentially unreliable for these use cases. These enhancements would make it significantly easier to monitor and analyze rollout behavior at scale.
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.
The text was updated successfully, but these errors were encountered: