-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Install SkyPilot runtime in separate env #3575
Conversation
…o skypilot-runtime-env
…g/skypilot into skypilot-runtime-env
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this @Michaelvll ! It looks great to me :)) Left some nits
llm/axolotl/axolotl.yaml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update Axolotl's GitHub readme with this, maybe after 0.6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will update it in a separate PR.
Blocked by #3572Moved #2801 here due to the merging issues and our latest support for explicitly python / ray path.
This PR enables:
Closes #2722
Mitigate #2673
Background
Installing every skypilot runtime in the base environment can cause issues when users also tries to install their dependencies in the base environment on the remote VM. It is very easy to cause the VM to become in an unexpected state. We now move all the skypilot runtime to a new conda environment for better isolation from the users' own Python environment.
Performance
New cluster
TL;DR: there is a negligible negative effect to the launching time for all the clouds with the separate env.
Results on AWS, GCP, Kubernetes
AWS
multitime -n 5 sky launch -y --cloud aws --cpus 2
master
This PR
GCP
multitime -n 5 sky launch -y --cloud gcp --cpus 2
master
This PR
(21s slower, due to the installation of dependencies for ray and skypilot)
after adding a6f6996
Kubernetes
master:
This PR:
Launch on existing cluster
sky launch -c existing-cluster --cpus 2 --cloud kubernetes -y
multitime -n 5 sky launch -c existing-cluster echo hi
Master:
This PR:
Tested (run the relevant ones):
bash format.sh
sky launch -c test-ax --cloud kubernetes axolotl.yaml --env HF_TOKEN
sky launch -c test-ax --cloud gcp -i 0 --down --use-spot axolotl.yaml --env HF_TOKEN
sky launch -c test-ax --cloud gcp -i 0 --down --use-spot axolotl.yaml --env HF_TOKEN --image-id docker:winglian/axolotl:main-latest
(with python 3.12 in the docker image)sky launch --image-id ami-0df2a11dd1fe1f8e3 --cpus 2 --cloud aws --region us-east-1 -y -i20 --down conda env list
(AMI used in Custom images: challenges & problems to solve #2673)pytest tests/test_smoke.py
pytest tests/test_smoke.py --aws
pytest tests/test_smoke.py --kubernetes
pytest tests/test_smoke.py::test_fill_in_the_name
conda deactivate; bash -i tests/backward_compatibility_tests.sh