Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the control loop #274

Closed
krancour opened this issue Oct 6, 2016 · 6 comments
Closed

Optimize the control loop #274

krancour opened this issue Oct 6, 2016 · 6 comments

Comments

@krancour
Copy link
Contributor

krancour commented Oct 6, 2016

Currently, the router does all of the following every ten seconds:

  1. Find the router's own deployment object.
  2. Find all routable services.
  3. Find secrets that contain certificates

All of these are the inputs used to build / compute a model. This model is deep compared to the previously computed model, which is kept in memory. When there are differences, Nginx configuration is re-generated by using the new model as input to a template. Nginx is reloaded with the new configuration.

Note that the model is computed before any comparison is made so that inconsequential changes to k8s resources (those that wouldn't affect the router's configuration) won't trigger an unnecessary Nginx reload.

Here's where there's room to optimize-- I feel like continuously computing the model is wasteful since changes that affect the router's configuration are a relatively rare event. This not only wastes CPU cycles, but depending on how many routable services live in your cluster, this process can be very chatty with the apiserver. This puts an unnecessary load on the network.

A more mature approach to this may be to watch the k8s event stream instead. Only in the event that a k8s resource we're interested in (as determined by a label or well known name) has been added or modified should we care about retrieving those resources and re-computing the model. What's more, because we'd know what changed before re-computing the model, we could also re-compute and replace a portion of the existing model.

cc @arschles -- I feel like if there's any obvious problem with this approach, you'd be the guy to spot it. 😉

@bacongobbler
Copy link
Member

related issue that this may resolve: #212

@krancour
Copy link
Contributor Author

krancour commented Oct 7, 2016

@bacongobbler I'm not sure about it solving #212... the reason is that when the router first starts, it needs to compute the entire config model. It's after that that we can watch the event stream and just compute deltas.

@bacongobbler
Copy link
Member

bacongobbler commented Oct 7, 2016

Ah yeah I guess the base case wouldn't be resolved, but rather every subsequent case. Good call.

FWIW I think this is a great idea regardless if it fixes #212 or not, but I don't have enough experience with the k8s event API to know if this'll work in the way we intend. I imagine @arschles or yourself know better than me. Would be more than happy taking a crack at this though!

@krancour
Copy link
Contributor Author

krancour commented Oct 8, 2016

@arschles and I both felt this would work. He can speak to it better than I, but he did say something about possible problems with the k8s client disconnecting. There's reconnect logic in Steward that covers that base. Would have to do the same here.

@krancour
Copy link
Contributor Author

krancour commented May 9, 2017

Experience working on the Kubernetes Service Catalog's controller has taught me that what we need to implement here is the so-called "informer pattern," which is the common pattern implemented by Kubernetes controllers. (The router, essentially, is a Kubernetes controller.)

@Cryptophobia
Copy link

This issue was moved to teamhephy/router#17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants