Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hotstart_threshold to flame.pool #32

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DeemoONeill
Copy link

upon the concurrency exceeding the threshold a new node is added to the pool.

This allows for pre-empting load and is intended for usecases where a cold-start is costly, such as for machine learning models

I'm not 100% on the implementation on this. Do we would want to spawn the new runner, but then do the cond case as usual? That way the waiting job goes into the exisiting runner, and the new runner can spin up.

Also the new runner will most likely become the new min_runner which means potentially not utilising the previous min runner to it's maximum potential. Would this be an issue?

related to #30

upon exceeding the threshold a new node is added to the pool. This
allows for pre-empting load and is intended for usecases where a cold-start
is too costly, such as for machine learning models
@samharnack
Copy link

samharnack commented May 12, 2024

@DeemoONeill this is great! This is something I've wanted to contribute to for a while, just wasn't sure where to start. You've also raised questions that hadn't occurred to me. I was thinking of using behaviours or a protocol and passing in the module or MFA instead. This would allow for custom growth strategies and flexible configuration.

Good callout on min_runner, it feels like the growth strategy and work distribution need to be separate concerns. At the cost of adding an extra dependency, is this something that GenStage could be used for? FLAME becomes a producer, and each runner is a consumer.

Thank you for putting some time into this! I'd love to pair with you on this if you ever want another pair of eyes.

@DeemoONeill
Copy link
Author

@samharnack apologies I missed this.

Do you mean you were thinking of having like a "statup_strategy" behaviour? That might actually be a good approach.

Have a default behavior which behaves as it does now, spinning up AT capacity. Then having a hotstart behavior which spins up at a percentage of max capacity. That way it's opt-in to the downsides and gives the option of user defined behaviors which use some other heuristics for when to spin up a new machine.

I don't have much capacity until after the weekend, but would be happy to go through some ideas with you

@samharnack
Copy link

@DeemoONeill you are in good company, I guess I don't have Github notifications enabled :/

That's exactly what was thinking. The config would turn into something like this:

children = [
  ...,
  {FLAME.Pool, name: MyRunner, min: 1, max: 10, max_concurrency: 100, strategy: {CustomStrategy, [hotstart_threshold: 0.5]}}
]

I think a first step would be extracting the current scaling logic into a default strategy and getting it merged into main.

I'll try to spike an idea this weekend.

@nickdichev-firework
Copy link

Hi @samharnack @DeemoONeill I've implemented the idea being discussed in #51 if you want to take a look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants