-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs revamp: polishing landing docs. #3797
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -20,9 +20,8 @@ | |||||
|
||||||
</p> | ||||||
|
||||||
|
||||||
<h3 align="center"> | ||||||
Run LLMs and AI on Any Cloud | ||||||
Run AI on Any Infra — Unified, Faster, Cheaper | ||||||
</h3> | ||||||
|
||||||
---- | ||||||
|
@@ -38,11 +37,11 @@ | |||||
- [Nov, 2023] Using [**Axolotl**](https://github.com/OpenAccess-AI-Collective/axolotl) to finetune Mistral 7B on the cloud (on-demand and spot): [**example**](./llm/axolotl/) | ||||||
- [Sep, 2023] Case study: [**Covariant**](https://covariant.ai/) transformed AI development on the cloud using SkyPilot, delivering models 4x faster cost-effectively: [**read the case study**](https://blog.skypilot.co/covariant/) | ||||||
- [Aug, 2023] **Finetuning Cookbook**: Finetuning Llama 2 in your own cloud environment, privately: [**example**](./llm/vicuna-llama-2/), [**blog post**](https://blog.skypilot.co/finetuning-llama2-operational-guide/) | ||||||
- [June, 2023] Serving LLM 24x Faster On the Cloud [**with vLLM**](https://vllm.ai/) and SkyPilot: [**example**](./llm/vllm/), [**blog post**](https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/) | ||||||
|
||||||
<details> | ||||||
<summary>Archived</summary> | ||||||
|
||||||
- [June, 2023] Serving LLM 24x Faster On the Cloud [**with vLLM**](https://vllm.ai/) and SkyPilot: [**example**](./llm/vllm/), [**blog post**](https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/) | ||||||
- [Mar, 2024] Serve and deploy [**Databricks DBRX**](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm) on your infra: [**example**](./llm/dbrx/) | ||||||
- [Feb, 2024] Speed up your LLM deployments with [**SGLang**](https://github.com/sgl-project/sglang) for 5x throughput on SkyServe: [**example**](./llm/sglang/) | ||||||
- [Dec, 2023] Using [**LoRAX**](https://github.com/predibase/lorax) to serve 1000s of finetuned LLMs on a single instance in the cloud: [**example**](./llm/lorax/) | ||||||
|
@@ -54,33 +53,35 @@ | |||||
|
||||||
---- | ||||||
|
||||||
SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution. | ||||||
SkyPilot is a framework for running AI and batch workloads on any infra, offering unified execution, high cost savings, and high GPU availability. | ||||||
|
||||||
SkyPilot **abstracts away cloud infra burdens**: | ||||||
- Launch jobs & clusters on any cloud | ||||||
- Easy scale-out: queue and run many jobs, automatically managed | ||||||
- Easy access to object stores (S3, GCS, Azure, R2, IBM) | ||||||
SkyPilot **abstracts away infra burdens**: | ||||||
- Launch [dev clusters](https://skypilot.readthedocs.io/en/latest/examples/interactive-development.html), [jobs](https://skypilot.readthedocs.io/en/latest/examples/managed-jobs.html), and [serving](https://skypilot.readthedocs.io/en/latest/serving/sky-serve.html) on any infra | ||||||
- Easy job management: queue, run, and auto-recover many jobs | ||||||
|
||||||
SkyPilot **maximizes GPU availability for your jobs**: | ||||||
* Provision in all zones/regions/clouds you have access to ([the _Sky_](https://arxiv.org/abs/2205.07147)), with automatic failover | ||||||
SkyPilot **supports multiple clusters, clouds, and hardware** ([the Sky](https://arxiv.org/abs/2205.07147)): | ||||||
- Bring your reserved GPUs, Kubernetes clusters, or 12+ clouds | ||||||
- [Flexible provisioning](https://skypilot.readthedocs.io/en/latest/examples/auto-failover.html) of GPUs, TPUs, CPUs, with auto-retry | ||||||
|
||||||
SkyPilot **cuts your cloud costs**: | ||||||
* [Managed Spot](https://skypilot.readthedocs.io/en/latest/examples/spot-jobs.html): 3-6x cost savings using spot VMs, with auto-recovery from preemptions | ||||||
* [Optimizer](https://skypilot.readthedocs.io/en/latest/examples/auto-failover.html): 2x cost savings by auto-picking the cheapest VM/zone/region/cloud | ||||||
* [Autostop](https://skypilot.readthedocs.io/en/latest/reference/auto-stop.html): hands-free cleanup of idle clusters | ||||||
SkyPilot **cuts your cloud costs & maximizes GPU availability**: | ||||||
* [Autostop](https://skypilot.readthedocs.io/en/latest/reference/auto-stop.html): automatic cleanup of idle resources | ||||||
* [Managed Spot](https://skypilot.readthedocs.io/en/latest/examples/managed-jobs.html): 3-6x cost savings using spot instances, with preemption auto-recovery | ||||||
* [Optimizer](https://skypilot.readthedocs.io/en/latest/examples/auto-failover.html): 2x cost savings by auto-picking the cheapest & most available infra | ||||||
|
||||||
SkyPilot supports your existing GPU, TPU, and CPU workloads, with no code changes. | ||||||
|
||||||
Install with pip: | ||||||
```bash | ||||||
pip install -U "skypilot[aws,gcp,azure,oci,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp,kubernetes]" # choose your clouds | ||||||
# Choose your clouds: | ||||||
pip install -U "skypilot[kubernetes,aws,gcp,azure,oci,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp]" | ||||||
``` | ||||||
To get the latest features and fixes, use the nightly build or [install from source](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html): | ||||||
```bash | ||||||
pip install "skypilot-nightly[aws,gcp,azure,oci,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp,kubernetes]" # choose your clouds | ||||||
# Choose your clouds: | ||||||
pip install "skypilot-nightly[kubernetes,aws,gcp,azure,oci,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp]" | ||||||
``` | ||||||
|
||||||
Current supported providers (AWS, Azure, GCP, OCI, Lambda Cloud, RunPod, Fluidstack, Paperspace, Cudo, IBM, Samsung, Cloudflare, any Kubernetes cluster): | ||||||
[Current supported infra](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html) (Kubernetes; AWS, GCP, Azure, OCI, Lambda Cloud, Fluidstack, RunPod, Cudo, Paperspace, Cloudflare, Samsung, IBM, VMware vSphere): | ||||||
<p align="center"> | ||||||
<picture> | ||||||
<source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/skypilot-org/skypilot/master/docs/source/images/cloud-logos-dark.png"> | ||||||
|
@@ -155,6 +156,7 @@ To learn more, see our [Documentation](https://skypilot.readthedocs.io/en/latest | |||||
<!-- Keep this section in sync with index.rst in SkyPilot Docs --> | ||||||
Runnable examples: | ||||||
- LLMs on SkyPilot | ||||||
- [Llama 3.1 finetuning](./llm/llama-3_1-finetuning/) and [serving](./llm/llama-3_1/) | ||||||
- [GPT-2 via `llm.c`](./llm/gpt-2/) | ||||||
- [Llama 3](./llm/llama-3/) | ||||||
- [Qwen](./llm/qwen/) | ||||||
|
@@ -177,6 +179,8 @@ Runnable examples: | |||||
- Add yours here & see more in [`llm/`](./llm)! | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
nit: '/!' doesn't look clean. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe fine? |
||||||
- Framework examples: [PyTorch DDP](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_torch.yaml), [DeepSpeed](./examples/deepspeed-multinode/sky.yaml), [JAX/Flax on TPU](https://github.com/skypilot-org/skypilot/blob/master/examples/tpu/tpuvm_mnist.yaml), [Stable Diffusion](https://github.com/skypilot-org/skypilot/tree/master/examples/stable_diffusion), [Detectron2](https://github.com/skypilot-org/skypilot/blob/master/examples/detectron2_docker.yaml), [Distributed](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_tf_app.py) [TensorFlow](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_app_storage.yaml), [Ray Train](examples/distributed_ray_train/ray_train.yaml), [NeMo](https://github.com/skypilot-org/skypilot/blob/master/examples/nemo/nemo.yaml), [programmatic grid search](https://github.com/skypilot-org/skypilot/blob/master/examples/huggingface_glue_imdb_grid_search_app.py), [Docker](https://github.com/skypilot-org/skypilot/blob/master/examples/docker/echo_app.yaml), [Cog](https://github.com/skypilot-org/skypilot/blob/master/examples/cog/), [Unsloth](https://github.com/skypilot-org/skypilot/blob/master/examples/unsloth/unsloth.yaml), [Ollama](https://github.com/skypilot-org/skypilot/blob/master/llm/ollama), [llm.c](https://github.com/skypilot-org/skypilot/tree/master/llm/gpt-2) and [many more (`examples/`)](./examples). | ||||||
|
||||||
Case Studies and Integrations: [Community Spotlights](https://blog.skypilot.co/community/) | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: We can perhaps sublist the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Intention was to keep it that page shorter |
||||||
|
||||||
Follow updates: | ||||||
- [Twitter](https://twitter.com/skypilot_org) | ||||||
- [Slack](http://slack.skypilot.co) | ||||||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -2,52 +2,53 @@ Welcome to SkyPilot! | |||||
==================== | ||||||
|
||||||
.. image:: /_static/SkyPilot_wide_dark.svg | ||||||
:width: 65% | ||||||
:width: 50% | ||||||
:align: center | ||||||
:alt: SkyPilot | ||||||
:class: no-scaled-link, only-dark | ||||||
.. image:: /_static/SkyPilot_wide_light.svg | ||||||
:width: 60% | ||||||
:width: 50% | ||||||
:align: center | ||||||
:alt: SkyPilot | ||||||
:class: no-scaled-link, only-light | ||||||
|
||||||
|
||||||
.. raw:: html | ||||||
|
||||||
<p></p> | ||||||
<p style="text-align:center"> | ||||||
<strong>Run AI on Any Infra</strong> — Unified, Faster, Cheaper | ||||||
</p> | ||||||
<p style="text-align:center"> | ||||||
<a class="github-button" href="https://github.com/skypilot-org/skypilot" data-show-count="true" data-size="large" aria-label="Star skypilot-org/skypilot on GitHub">Star</a> | ||||||
<a class="github-button" href="https://github.com/skypilot-org/skypilot/subscription" data-icon="octicon-eye" data-size="large" aria-label="Watch skypilot-org/skypilot on GitHub">Watch</a> | ||||||
<a class="github-button" href="https://github.com/skypilot-org/skypilot/fork" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork skypilot-org/skypilot on GitHub">Fork</a> | ||||||
<br> | ||||||
<a class="reference external image-reference" style="vertical-align:9.5px" href="http://slack.skypilot.co"><img src="https://img.shields.io/badge/SkyPilot-Join%20Slack-blue?logo=slack" style="height:27px"></a> | ||||||
<script async defer src="https://buttons.github.io/buttons.js"></script> | ||||||
</p> | ||||||
|
||||||
<p style="text-align:center"> | ||||||
<strong>Run LLMs and AI on Any Cloud</strong> | ||||||
</p> | ||||||
|
||||||
SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution. | ||||||
SkyPilot is a framework for running AI and batch workloads on any infra, offering unified execution, high cost savings, and high GPU availability. | ||||||
|
||||||
SkyPilot **abstracts away cloud infra burdens**: | ||||||
SkyPilot **abstracts away infra burdens**: | ||||||
|
||||||
- Launch jobs & clusters on any cloud | ||||||
- Easy scale-out: queue and run many jobs, automatically managed | ||||||
- Easy access to object stores (S3, GCS, Azure, R2, IBM) | ||||||
- Launch :ref:`dev clusters <dev-cluster>`, :ref:`jobs <managed-jobs>`, and :ref:`serving <sky-serve>` on any infra | ||||||
- Easy job management: queue, run, and auto-recover many jobs | ||||||
|
||||||
SkyPilot **maximizes GPU availability for your jobs**: | ||||||
SkyPilot **supports multiple clusters, clouds, and hardware** (`the Sky <https://arxiv.org/abs/2205.07147>`_): | ||||||
|
||||||
* Provision in all zones/regions/clouds you have access to (`the Sky <https://arxiv.org/abs/2205.07147>`_), with automatic failover | ||||||
- Bring your reserved GPUs, Kubernetes clusters, or 12+ clouds | ||||||
- :ref:`Flexible provisioning <auto-failover>` of GPUs, TPUs, CPUs, with auto-retry | ||||||
|
||||||
SkyPilot **cuts your cloud costs**: | ||||||
SkyPilot **cuts your cloud costs & maximizes GPU availability**: | ||||||
|
||||||
* `Managed Spot <https://skypilot.readthedocs.io/en/latest/examples/spot-jobs.html>`_: 3-6x cost savings using spot VMs, with auto-recovery from preemptions | ||||||
* Optimizer: 2x cost savings by auto-picking the cheapest VM/zone/region/cloud | ||||||
* `Autostop <https://skypilot.readthedocs.io/en/latest/reference/auto-stop.html>`_: hands-free cleanup of idle clusters | ||||||
* :ref:`Autostop <auto-stop>`: automatic cleanup of idle resources | ||||||
* :ref:`Managed Spot <managed-jobs>`: 3-6x cost savings using spot instances, with preemption auto-recovery | ||||||
* :ref:`Optimizer <auto-failover>`: 2x cost savings by auto-picking the cheapest & most available infra | ||||||
|
||||||
SkyPilot supports your existing GPU, TPU, and CPU workloads, with no code changes. | ||||||
|
||||||
Current supported providers (AWS, GCP, Azure, OCI, Lambda Cloud, RunPod, Fluidstack, Cudo, IBM, Samsung, Cloudflare, VMware vSphere, any Kubernetes cluster): | ||||||
:ref:`Current supported infra <installation>` (Kubernetes; AWS, GCP, Azure, OCI, Lambda Cloud, Fluidstack, RunPod, Cudo, Paperspace, Cloudflare, Samsung, IBM, VMware vSphere): | ||||||
|
||||||
.. raw:: html | ||||||
|
||||||
|
@@ -58,10 +59,20 @@ Current supported providers (AWS, GCP, Azure, OCI, Lambda Cloud, RunPod, Fluidst | |||||
</picture> | ||||||
</p> | ||||||
|
||||||
More Information | ||||||
-------------------------- | ||||||
Ready to get started? | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Reducing words:
Suggested change
Also it might be ok to remove this section altogether. Not feeling strongly There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Kind of feeling the question form is more attractive. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Slightly leaning towards There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Want to experiment with this question tone. |
||||||
---------------------- | ||||||
|
||||||
Tutorials: `SkyPilot Tutorials <https://github.com/skypilot-org/skypilot-tutorial>`_ | ||||||
:ref:`Install SkyPilot <installation>` in ~1 minute. Then, launch your first dev cluster in ~5 minutes in :ref:`Quickstart <quickstart>`. | ||||||
|
||||||
Everything is launched within your cloud accounts, VPCs, and cluster(s). | ||||||
|
||||||
Contact the SkyPilot team | ||||||
--------------------------------- | ||||||
|
||||||
You can chat with the SkyPilot team and community on the `SkyPilot Slack <http://slack.skypilot.co>`_. | ||||||
|
||||||
Learn more | ||||||
-------------------------- | ||||||
|
||||||
Runnable examples: | ||||||
|
||||||
|
@@ -93,6 +104,10 @@ Runnable examples: | |||||
|
||||||
* Framework examples: `PyTorch DDP <https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_torch.yaml>`_, `DeepSpeed <https://github.com/skypilot-org/skypilot/blob/master/examples/deepspeed-multinode/sky.yaml>`_, `JAX/Flax on TPU <https://github.com/skypilot-org/skypilot/blob/master/examples/tpu/tpuvm_mnist.yaml>`_, `Stable Diffusion <https://github.com/skypilot-org/skypilot/tree/master/examples/stable_diffusion>`_, `Detectron2 <https://github.com/skypilot-org/skypilot/blob/master/examples/detectron2_docker.yaml>`_, `Distributed <https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_tf_app.py>`_ `TensorFlow <https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_app_storage.yaml>`_, `NeMo <https://github.com/skypilot-org/skypilot/blob/master/examples/nemo/nemo_gpt_train.yaml>`_, `programmatic grid search <https://github.com/skypilot-org/skypilot/blob/master/examples/huggingface_glue_imdb_grid_search_app.py>`_, `Docker <https://github.com/skypilot-org/skypilot/blob/master/examples/docker/echo_app.yaml>`_, `Cog <https://github.com/skypilot-org/skypilot/blob/master/examples/cog/>`_, `Unsloth <https://github.com/skypilot-org/skypilot/blob/master/examples/unsloth/unsloth.yaml>`_, `Ollama <https://github.com/skypilot-org/skypilot/blob/master/llm/ollama>`_, `llm.c <https://github.com/skypilot-org/skypilot/tree/master/llm/gpt-2>`__ and `many more <https://github.com/skypilot-org/skypilot/tree/master/examples>`_. | ||||||
|
||||||
Case Studies and Integrations: `Community Spotlights <https://blog.skypilot.co/community/>`_ | ||||||
|
||||||
Tutorials: `SkyPilot Tutorials <https://github.com/skypilot-org/skypilot-tutorial>`_ | ||||||
|
||||||
Follow updates: | ||||||
|
||||||
* `Twitter <https://twitter.com/skypilot_org>`_ | ||||||
|
@@ -106,10 +121,9 @@ Read the research: | |||||
* `Sky Computing vision paper <https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s02-stoica.pdf>`_ (HotOS 2021) | ||||||
|
||||||
|
||||||
Contents | ||||||
-------- | ||||||
|
||||||
.. toctree:: | ||||||
:hidden: | ||||||
:maxdepth: 1 | ||||||
:caption: Getting Started | ||||||
|
||||||
|
@@ -120,6 +134,7 @@ Contents | |||||
|
||||||
|
||||||
.. toctree:: | ||||||
:hidden: | ||||||
:maxdepth: 1 | ||||||
:caption: Running Jobs | ||||||
|
||||||
|
@@ -130,6 +145,7 @@ Contents | |||||
../running-jobs/distributed-jobs | ||||||
|
||||||
.. toctree:: | ||||||
:hidden: | ||||||
:maxdepth: 1 | ||||||
:caption: SkyServe: Model Serving | ||||||
|
||||||
|
@@ -138,6 +154,7 @@ Contents | |||||
../serving/service-yaml-spec | ||||||
|
||||||
.. toctree:: | ||||||
:hidden: | ||||||
:maxdepth: 1 | ||||||
:caption: Cutting Cloud Costs | ||||||
|
||||||
|
@@ -146,13 +163,15 @@ Contents | |||||
../reference/benchmark/index | ||||||
|
||||||
.. toctree:: | ||||||
:hidden: | ||||||
:maxdepth: 1 | ||||||
:caption: Using Data | ||||||
|
||||||
../examples/syncing-code-artifacts | ||||||
../reference/storage | ||||||
|
||||||
.. toctree:: | ||||||
:hidden: | ||||||
:maxdepth: 1 | ||||||
:caption: User Guides | ||||||
|
||||||
|
@@ -165,13 +184,15 @@ Contents | |||||
|
||||||
|
||||||
.. toctree:: | ||||||
:hidden: | ||||||
:maxdepth: 1 | ||||||
:caption: Developer Guides | ||||||
|
||||||
../developers/CONTRIBUTING | ||||||
Guide: Adding a New Cloud <https://docs.google.com/document/d/1oWox3qb3Kz3wXXSGg9ZJWwijoa99a3PIQUHBR8UgEGs/edit?usp=sharing> | ||||||
|
||||||
.. toctree:: | ||||||
:hidden: | ||||||
:maxdepth: 1 | ||||||
:caption: Cloud Admin and Usage | ||||||
|
||||||
|
@@ -180,6 +201,7 @@ Contents | |||||
../cloud-setup/quota | ||||||
|
||||||
.. toctree:: | ||||||
:hidden: | ||||||
:maxdepth: 1 | ||||||
:caption: References | ||||||
|
||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to keep this chronological order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, done.