Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] grow support #2791

Open
dongahn opened this issue Mar 1, 2020 · 5 comments
Open

[discussion] grow support #2791

dongahn opened this issue Mar 1, 2020 · 5 comments
Labels
design don't expect this to ever be closed...

Comments

@dongahn
Copy link
Member

dongahn commented Mar 1, 2020

Spins off from flux-framework/rfc#230. There is an idea of adding an additional R into JOBID schema to support "grow" in the same way as "shrink"

A key question:

Then the complementary "grow" directory could contain chunks that are added.
While we are here, maybe we can also has this out a bit as this is what @milroy will soon need.

I don't think adding an additional R is difficult. But what is currently difficult would be how to do this under the original JOBID. In particular, flux job submit will always generate a new JOBID. Do you think there is an easy path to to generate a new R under the same JOBID using flux job submit|flux mini interface?

@garlick
Copy link
Member

garlick commented Mar 1, 2020

When we discussed this before, I think a new "grow" interface was proposed but maybe that is out of scope for RFC 20?

@dongahn
Copy link
Member Author

dongahn commented Mar 2, 2020

Yeah something like flux grow or similar can be a solution. I will change this to flux core issue.

@dongahn dongahn transferred this issue from flux-framework/rfc Mar 2, 2020
@dongahn dongahn changed the title rfc20: grow support [discussion] grow support Mar 2, 2020
@milroy
Copy link
Member

milroy commented Mar 5, 2020

Here's a summary of the coffee hour discussion yesterday with @dongahn, @garlick, @grondo, and @SteVwonder. Please add details I left out here.

While having the ability to unify an existing R and a new R' under the same JOBID is desirable, this can't be accomplished in the near term. @garlick noted that it will be particularly tricky to wire new subtrees into an existing TBON. Since the immediate application for the grow functionality is to test Kubernetes jobs, bypassing the broker network issue with upcoming broker-less functionality was discussed. That capability won't be ready in the short term. @dongahn proposed the idea of a grow operation as a checkpoint restart (which is needed anyway in flux-sched), where the current job is checkpointed and then restarted with additional resources (along the lines of flux-framework/flux-sched#470). This may be useful for future grow investigation.

Two related interim solutions were proposed which satisfy grow by a new job submission (and new JOBID). The first is to use the URI of the new job to exec Kubernetes commands on the new job's resources, and the other is simply to put those commands into a job script. With appropriate labels and scale ranges Kubernetes should handle starting containers on the new resources. One item to keep in mind is that the new resources should have an end time equal to the original job. That will prevent users from having interminable Kubernetes jobs by packing the head of the flux-sched queue with grow requests.

@stale
Copy link

stale bot commented Mar 5, 2021

This issue has been automatically marked as stale because it has not had activity for 365 days. It will be closed if no further activity occurs within 14 days. Thank you for your contributions.

@stale stale bot added the wontfix label Mar 5, 2021
@SteVwonder SteVwonder added design don't expect this to ever be closed... and removed wontfix labels Mar 8, 2021
@vsoch
Copy link
Member

vsoch commented Dec 18, 2023

Is this still under discussion for our elasticity work? I remember more recent discussion about grow, but in the context of flux-sched. A linked issue (still open) for flux-core is #2802. For flux-sched, for issues/PRs I'm finding:

And then there are a bunch with an elasticity label but I think a few years old: https://github.com/flux-framework/flux-sched/issues?q=label%3Aelasticity+. I'm not up to date on anything really (and apologies for that, but I know this is important) so let's make sure we sync all of these into some cohesive next step(s) if it's still important. I'd like to understand, for example, how the recent flux-sched 989 might help with #2802.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design don't expect this to ever be closed...
Projects
None yet
Development

No branches or pull requests

5 participants