Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bandwidth aware routing #670

Open
phillebaba opened this issue Dec 17, 2024 · 3 comments
Open

Bandwidth aware routing #670

phillebaba opened this issue Dec 17, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@phillebaba
Copy link
Member

phillebaba commented Dec 17, 2024

Describe the problem to be solved

When running large clusters, situations can arise where the same image is being pulled from the same node. These happens especially during rollouts of new deployments where initially a few images will have pulled the image. In small clusters this is generally not a problem as the pressure on individual nodes is fairly limited. In large clusters however we can have hundreds of nodes pulling from the same node. As the underlying VM has limited network bandwidth the pulling of images will become slower and slower. Which could cause all image pulls to fail. It would be a lot more preferable to allow a few nodes to pull the image faster so that they also can start distributing the image.

Proposed solution to the problem

The easy solution would be to limit the amount of in flight requests to a node. This would however not cover the fact that different layers are of different size. Another option would be to limit the total amount of bytes that can be served, and deny any further requests. The third option would be to set a cap on the bandwidth when serving the layers so that new requests do not slow down in flight requests.

Relates to #551 and #530

@phillebaba phillebaba added the enhancement New feature or request label Dec 17, 2024
@phillebaba phillebaba moved this to Todo in Roadmap Dec 17, 2024
@craig-seeman
Copy link

craig-seeman commented Jan 23, 2025

Another option would be to limit the total amount of bytes that can be served, and deny any further requests.

I like the thought of this, perhaps we could utilize something like a simple 429 reply if utilization is above a certain threshold and have the client either wait and try again in 5 seconds if it sees a 429 or move onto another host?

@phillebaba
Copy link
Member Author

I agree that the best approach is probably to allow the server to return a 429. The challenge is mostly figuring out what constitutes high load. While limiting the amount of in flight requests would be simple it may not be a good reflection of load. I need to understand the background a bit more, especially about the impact file size has. For example will serving small manifests be negatively impacted by limiting the amount of in flight requests.

We can always add back offs in the scenario of 429 where the same request it retried with a delay.

@craig-seeman
Copy link

That's a true point. I'm wondering that you'd perhaps only want to monitor/limit blob url requests and just always serve manifests.

The more I sit and think about it, too, you may want to have two choices for the user on how the limit would be checked:

  1. A configurable cap for spegel itself to check against (Akin to the disk throttling spegel used to have) and check how the utilization is on that. IE: 100mbps (guessing youd wanna use mbps terms for linespeed).

  2. A configurable cap for the host/node itself to check against. I could see where some nodes might be loaded with a certain workload which is much more network intensive than others and you might want to limit these so that bandwidth is available for those workloads. There are a lot of 'devil in the details' here on how to get this and which eth interface to look at, but this could be a percentage setting on the overall linespeed which would be detected via interface inspection (IE: 80% - don't let the network i/o on the host go above 80%) or a MB/s cap. Spegel could check the current or average utilization of the interface(s) on the node itself and ensure that it does not go above a certain limit.

The other tricky part here as you pointed out is how to predict the impact of a blob being sent. I'm almost wondering about an approach where spegel would be tracking the average transfer rate of a blob (Perhaps a remedial number like a filesize / total seconds to transfer calculation) for each request it's handling. What you could do is sort of predict a fuzzy average on how much average bandwidth rate that transfer could take since we know the size of the blob we're going to send (and even the time). If spegel were to take the average utilization seen by 1. or 2. above, and add that number it's predicting based off of the average blob transfer rate; I think spegel could have a good guess if that is going to go above the configured limit.

I know it's not perfect by any means, but I think we're a bit lucky in that we're not going to have to deal with what a normal CDN or internet device might see with wildly different speeds of transfer as in a kubernetes cluster, the communication is likely going over LAN (except in some unique setups) and it is likely pretty equal across many setups that the sender will be the bottleneck in the average transfer rate.

With all that being said, there is the whole other option, too of limiting or capping bandwidth on spegel http process itself FOR the transfers which is a whole other thing. The approach I outlined above basically forces you to run against caps and encounter slowdowns before it starts limiting things, so there are definitely drawbacks to my approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

2 participants