Bandwidth aware routing #670

phillebaba · 2024-12-17T23:50:24Z

Describe the problem to be solved

When running large clusters, situations can arise where the same image is being pulled from the same node. These happens especially during rollouts of new deployments where initially a few images will have pulled the image. In small clusters this is generally not a problem as the pressure on individual nodes is fairly limited. In large clusters however we can have hundreds of nodes pulling from the same node. As the underlying VM has limited network bandwidth the pulling of images will become slower and slower. Which could cause all image pulls to fail. It would be a lot more preferable to allow a few nodes to pull the image faster so that they also can start distributing the image.

Proposed solution to the problem

The easy solution would be to limit the amount of in flight requests to a node. This would however not cover the fact that different layers are of different size. Another option would be to limit the total amount of bytes that can be served, and deny any further requests. The third option would be to set a cap on the bandwidth when serving the layers so that new requests do not slow down in flight requests.

Relates to #551 and #530

craig-seeman · 2025-01-23T16:19:54Z

Another option would be to limit the total amount of bytes that can be served, and deny any further requests.

I like the thought of this, perhaps we could utilize something like a simple 429 reply if utilization is above a certain threshold and have the client either wait and try again in 5 seconds if it sees a 429 or move onto another host?

phillebaba · 2025-01-27T15:57:51Z

I agree that the best approach is probably to allow the server to return a 429. The challenge is mostly figuring out what constitutes high load. While limiting the amount of in flight requests would be simple it may not be a good reflection of load. I need to understand the background a bit more, especially about the impact file size has. For example will serving small manifests be negatively impacted by limiting the amount of in flight requests.

We can always add back offs in the scenario of 429 where the same request it retried with a delay.

craig-seeman · 2025-01-30T15:37:28Z

That's a true point. I'm wondering that you'd perhaps only want to monitor/limit blob url requests and just always serve manifests.

The more I sit and think about it, too, you may want to have two choices for the user on how the limit would be checked:

A configurable cap for spegel itself to check against (Akin to the disk throttling spegel used to have) and check how the utilization is on that. IE: 100mbps (guessing youd wanna use mbps terms for linespeed).
A configurable cap for the host/node itself to check against. I could see where some nodes might be loaded with a certain workload which is much more network intensive than others and you might want to limit these so that bandwidth is available for those workloads. There are a lot of 'devil in the details' here on how to get this and which eth interface to look at, but this could be a percentage setting on the overall linespeed which would be detected via interface inspection (IE: 80% - don't let the network i/o on the host go above 80%) or a MB/s cap. Spegel could check the current or average utilization of the interface(s) on the node itself and ensure that it does not go above a certain limit.

The other tricky part here as you pointed out is how to predict the impact of a blob being sent. I'm almost wondering about an approach where spegel would be tracking the average transfer rate of a blob (Perhaps a remedial number like a filesize / total seconds to transfer calculation) for each request it's handling. What you could do is sort of predict a fuzzy average on how much average bandwidth rate that transfer could take since we know the size of the blob we're going to send (and even the time). If spegel were to take the average utilization seen by 1. or 2. above, and add that number it's predicting based off of the average blob transfer rate; I think spegel could have a good guess if that is going to go above the configured limit.

I know it's not perfect by any means, but I think we're a bit lucky in that we're not going to have to deal with what a normal CDN or internet device might see with wildly different speeds of transfer as in a kubernetes cluster, the communication is likely going over LAN (except in some unique setups) and it is likely pretty equal across many setups that the sender will be the bottleneck in the average transfer rate.

With all that being said, there is the whole other option, too of limiting or capping bandwidth on spegel http process itself FOR the transfers which is a whole other thing. The approach I outlined above basically forces you to run against caps and encounter slowdowns before it starts limiting things, so there are definitely drawbacks to my approach.

phillebaba added the enhancement New feature or request label Dec 17, 2024

phillebaba moved this to Todo in Roadmap Dec 17, 2024

phillebaba added this to Roadmap Dec 17, 2024

craig-seeman mentioned this issue Jan 23, 2025

Network topology awareness #669

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bandwidth aware routing #670

Bandwidth aware routing #670

phillebaba commented Dec 17, 2024 •

edited

Loading

craig-seeman commented Jan 23, 2025 •

edited

Loading

phillebaba commented Jan 27, 2025

craig-seeman commented Jan 30, 2025

Bandwidth aware routing #670

Bandwidth aware routing #670

Comments

phillebaba commented Dec 17, 2024 • edited Loading

Describe the problem to be solved

Proposed solution to the problem

craig-seeman commented Jan 23, 2025 • edited Loading

phillebaba commented Jan 27, 2025

craig-seeman commented Jan 30, 2025

phillebaba commented Dec 17, 2024 •

edited

Loading

craig-seeman commented Jan 23, 2025 •

edited

Loading