Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Spot] Support both managed on-demand and spot instances #2545

Closed
wants to merge 8 commits into from

Conversation

MaoZiming
Copy link
Collaborator

@MaoZiming MaoZiming commented Sep 12, 2023

From discussion with @Michaelvll

  • Allow serverless style job submission for on-demand jobs.
    • Conceptually, the API contains two kinds of jobs, serverful (launch, exec, start, stop), and serverless (spot launch).
    • However, the serverless API is only spot oriented. It could be better to have on-demand support as well, and rename the spot launch to something like serverless launch.
    • This will benefit the TPU users, as the on-demand TPUs can still have quite frequent restart due to stability issue.
    • Small cloud users may benefit from this as well, for improved reliability
  • Design a way to select from both on-demand and spot instances.
    • Smaller cloud have quite cheap on-demand with similar price points as spot
    • Good for optimizer to choose among both on-demand instances and spot.

Possibly replace use_spot: True with

managed: SPOT
# managed: ON_DEMAND
# managed: CONSIDER_BOTH

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

@MaoZiming MaoZiming changed the title Spot/on-demand scheduling [Spot] Spot/on-demand scheduling Sep 12, 2023
@MaoZiming MaoZiming changed the title [Spot] Spot/on-demand scheduling [Spot] Support both managed on-demand and spot instances Sep 13, 2023
@MaoZiming MaoZiming marked this pull request as ready for review September 13, 2023 16:25
@Michaelvll Michaelvll self-requested a review September 28, 2023 16:26
@MaoZiming MaoZiming closed this Dec 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant