Skip to content
This repository has been archived by the owner on Apr 22, 2020. It is now read-only.

Expose redis queue length metrics from ZMON worker #376

Open
mikkeloscar opened this issue Oct 26, 2018 · 6 comments
Open

Expose redis queue length metrics from ZMON worker #376

mikkeloscar opened this issue Oct 26, 2018 · 6 comments
Labels
feature improvements, enhancement, functionality requested by users

Comments

@mikkeloscar
Copy link

We should scale the number of zmon-workers running in Kubernetes based on the redis queue length. Since we now have custom metrics available in our Kubernetes setup we can do it by exposing a metric from the pods and scale based on that.

If each zmon-worker could expose the current redis queue length in a json metrics enpoint, then we could use the Horizontal Pod Autoscaler configuration described here: https://github.com/zalando-incubator/kube-metrics-adapter#example to do the scaling. This would allow us to run with a baseline of 1 zmon-worker in each cluster and only scale up when needed.

The alternative to the json metrics enpoint could be to scale on a ZMON check but it would not make sense to depend on ZMON in order to scale... ZMON. :)

@mohabusama mohabusama added the feature improvements, enhancement, functionality requested by users label Oct 26, 2018
@vetinari
Copy link
Contributor

vetinari commented Nov 7, 2018

The main problem with this is, that we have a quite stable input from the ZMON scheduler into the queue. Once we reached the 0 length queue, we must not scale down again to keep the current worker throughput.

@mikkeloscar
Copy link
Author

mikkeloscar commented Nov 8, 2018

What about exposing another value than queue length? E.g. "scheduled checks per minute" or whatever makes sense for the workers, then you have a number that will not be 0.

Just an idea: If zmon-scheduler exposes a count of scheduled events in prometheus format, then we could use a prometheus query as the metric source for scaling e.g. events per min.

@jrake-revelant
Copy link

up, let us start discussing this again.

@szuecs
Copy link
Contributor

szuecs commented Sep 3, 2019

Maybe queue length aggregated over a specified time frame is good enough.

It could work like this:

  • Prometheus collects queue size
  • zmon check to query Prometheus, like this: sum(rate(queue_size{}[5m]))
  • custom metrics HPA to use zmon check

This should work without exposing zmon-scheduler stats, because the rate will be the same and not fluctuate too much.

@mikkeloscar
Copy link
Author

@szuecs We wouldn't need the zmon check, we can simply have an HPA based on prometheus query: https://github.com/zalando-incubator/kube-metrics-adapter#example-external-metric

@szuecs
Copy link
Contributor

szuecs commented Sep 3, 2019

true, but if you need more logic you can do this in zmon check

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature improvements, enhancement, functionality requested by users
Projects
None yet
Development

No branches or pull requests

5 participants