Skip to content
This repository has been archived by the owner on Apr 22, 2020. It is now read-only.

Implement throttling for check results metrics #380

Closed
mohabusama opened this issue Nov 26, 2018 · 4 comments
Closed

Implement throttling for check results metrics #380

mohabusama opened this issue Nov 26, 2018 · 4 comments
Assignees
Labels
feature improvements, enhancement, functionality requested by users in progress

Comments

@mohabusama
Copy link
Contributor

mohabusama commented Nov 26, 2018

By enabling throttling(via random sampling), we can roughly control the percentage of metrics to be stored in ZMON timeseries database per worker.

The worker could get the sampling rate from:

  • Config (env var)
  • ZMON entity -> per account sampling rate

There should be a list of critical checks that can be excluded from sampling. Critical checks can be defined in:

  • Config
  • ZMON entity
  • A flag in the check
@mohabusama mohabusama added the feature improvements, enhancement, functionality requested by users label Nov 26, 2018
@mohabusama
Copy link
Contributor Author

mohabusama commented Dec 3, 2018

Grafana chart with count(30s) of a check with sampling 50%
Check interval is 15s.

metrics sampling

@lmineiro
Copy link

lmineiro commented Dec 4, 2018

How will this affect metrics that have SUM() aggregates? Sampling out some data points will skew the monitoring.

For examples, a metric that reports throughput (req/sec) is often aggregated by summing all the data points. If some of them are suppressed due to sampling, the result will be a drop in throughput.
Was this considered?

@mohabusama
Copy link
Contributor Author

There is support for critical_checks which can be used to overcome sampling. The idea in general is to throttle spammy workers and avoid collateral damage.
The critical check flag is currently not supported in ZMON, but if there then we have the notion of system/out-of-the-box/critical checks that are essential to the health of the systems.

@mohabusama
Copy link
Contributor Author

@lmineiro Created another issue for more aligned sampling as a feature in ZMON, with more deterministic results.
Also re-editing this one to clarify its purpose.

@mohabusama mohabusama changed the title Implement sampling for check results metrics Implement throttling for check results metrics Dec 4, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature improvements, enhancement, functionality requested by users in progress
Projects
None yet
Development

No branches or pull requests

3 participants