Skip to content

Commit

Permalink
[Core/UX] Add Job API and support managed job (both on-demand and spo…
Browse files Browse the repository at this point in the history
…t) (#3419)

* Refactor spot core APIs to `sky.spot.core`

* Add comment

* fix

* format

* change to spot_lib instead

* change spot to job

* rename modules

* rename to managed job

* fix

* Allow on-demand for managed job

* fix launch

* Fixes names

* rename to job controller

* rename to job controller

* Fix job recovery

* format

* Add CLI alias

* format

* rename

* improve resources

* fix doc

* fix test

* fix _cpus

* fallback to old controller

* fix unit test

* backward compat job

* change to --managed-job

* fix job_recovery

* refactor schemas

* remove resources not having price

* format

* fix

* fix managed job

* format

* format

* fix

* fix type for resource str

* Fix test smoke

* add request output

* Merge and format

* merge error fix

* fix merge issue

* fix output

* fix test

* rename to jobs

* Replace spot_recovery  to job_recovery

* rename spot_ to jobs_

* format

* add legacy job signal

* address comments

* renames

* Fix controller type

* incorporate #2080

* fix managed jobs

* fix dashboard

* Fix test

* remove old code

* format

* update doc

* fix managed jobs

* Address comments

* address comments

* Add raw file

* only use aws and gcp for pipeline

* Fix doc

* use managed-jobs

* Fix back

* fix

* address comments

* Fix

* format

* fix

* Fix

* minor

* Update sky/cli.py

Co-authored-by: Zongheng Yang <[email protected]>

* Update sky/job/utils.py

Co-authored-by: Zongheng Yang <[email protected]>

* Update sky/cli.py

Co-authored-by: Zongheng Yang <[email protected]>

* Update sky/job/state.py

Co-authored-by: Zongheng Yang <[email protected]>

* Update sky/job/core.py

Co-authored-by: Zongheng Yang <[email protected]>

* Update sky/job/utils.py

Co-authored-by: Zongheng Yang <[email protected]>

* Update sky/job/core.py

Co-authored-by: Zongheng Yang <[email protected]>

* format

* address comments

* Fix optimizer table

* Fix best plan

* revert version 3

* fix

* add backward compat

* fix back compat

* fix docs

* fix PR template

* Fix docs

* format

* fix job logs

* fix

* fix

* fix docs

* fix backward

* fix backward

* fix localstorage

* fix job name in optimizer table

* Update managed-jobs.rst

* Update docs/source/reference/cli.rst

Co-authored-by: Zongheng Yang <[email protected]>

* address comments

* Fix job controller name

* Update sky/job/controller.py

Co-authored-by: Zongheng Yang <[email protected]>

* Update sky/job/utils.py

Co-authored-by: Zongheng Yang <[email protected]>

* add ress comment

* format

* Add comment

* Add comments

* check status again

* check again

* avoid lock

* format

* Update sky/backends/cloud_vm_ray_backend.py

Co-authored-by: Tian Xia <[email protected]>

* address comments

* rename to jobs and add CLI alias to job

* Add depdencies for all on-demand clouds

* fix

* Fix

* fix test smoke

* Update sky/utils/schemas.py

Co-authored-by: Tian Xia <[email protected]>

* fix names

* format

* fix

* 0.8.0 instead

* format

* fix doc

* fix cloudflare

* Fix job dashboard

* fix smoke

* update

* add comment for deprecation

* rename to jobs

* Update sky/clouds/cloud.py

Co-authored-by: Zongheng Yang <[email protected]>

* Update sky/jobs/utils.py

Co-authored-by: Zongheng Yang <[email protected]>

* Fix jobs

* format

* Update sky/jobs/utils.py

Co-authored-by: Zongheng Yang <[email protected]>

* minor

* separate boto3 and awscli

* fix

* Update docs/source/reference/config.rst

Co-authored-by: Tian Xia <[email protected]>

* rename to jobs controller

* format

* Rename to JobsController

* fix

* renames

* format

* format

* fix name in test

* address comments

* Fix docs

* Add managed job yaml

* fix

* Update sky/cli.py

Co-authored-by: Tian Xia <[email protected]>

* fix comment

---------

Co-authored-by: Zongheng Yang <[email protected]>
Co-authored-by: Tian Xia <[email protected]>
  • Loading branch information
3 people authored May 5, 2024
1 parent e0d04d3 commit 015061e
Show file tree
Hide file tree
Showing 73 changed files with 2,521 additions and 2,015 deletions.
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ Tested (run the relevant ones):
- [ ] Any manual or new tests for this PR (please specify below)
- [ ] All smoke tests: `pytest tests/test_smoke.py`
- [ ] Relevant individual smoke tests: `pytest tests/test_smoke.py::test_fill_in_the_name`
- [ ] Backward compatibility tests: `bash tests/backward_comaptibility_tests.sh`
- [ ] Backward compatibility tests: `conda deactivate; bash -i tests/backward_compatibility_tests.sh`
2 changes: 1 addition & 1 deletion .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
- tests/test_optimizer_random_dag.py
- tests/test_storage.py
- tests/test_wheels.py
- tests/test_spot_serve.py
- tests/test_jobs_and_serve.py
- tests/test_yaml_parser.py
runs-on: ubuntu-latest
steps:
Expand Down
1 change: 1 addition & 0 deletions docs/source/_static/custom.js
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ document.addEventListener('DOMContentLoaded', () => {
// New items:
const newItems = [
{ selector: '.caption-text', text: 'SkyServe: Model Serving' },
{ selector: '.toctree-l1 > a', text: 'Managed Jobs' },
{ selector: '.toctree-l1 > a', text: 'Running on Kubernetes' },
{ selector: '.toctree-l1 > a', text: 'DBRX (Databricks)' },
{ selector: '.toctree-l1 > a', text: 'Ollama' },
Expand Down
4 changes: 2 additions & 2 deletions docs/source/docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ Contents
:maxdepth: 1
:caption: Running Jobs

../examples/spot-jobs
../examples/managed-jobs
../reference/job-queue
../examples/auto-failover
../reference/kubernetes/index
Expand All @@ -139,7 +139,7 @@ Contents
:maxdepth: 1
:caption: Cutting Cloud Costs

../examples/spot-jobs
Managed Spot Jobs <../examples/spot-jobs>
../reference/auto-stop
../reference/benchmark/index

Expand Down
Loading

0 comments on commit 015061e

Please sign in to comment.