From e40b22139cff9275ab5f93decba6e41b52dab525 Mon Sep 17 00:00:00 2001 From: Zongheng Yang Date: Wed, 31 Jul 2024 13:28:17 -0700 Subject: [PATCH] Docs revamp: polishing landing docs. (#3797) * Docs revamp: polishing landing docs. * Add a note * Fix order --- README.md | 38 ++++++----- docs/source/docs/index.rst | 68 ++++++++++++------- .../examples/interactive-development.rst | 6 +- 3 files changed, 70 insertions(+), 42 deletions(-) diff --git a/README.md b/README.md index a6f1df49c91..42902017d75 100644 --- a/README.md +++ b/README.md @@ -20,9 +20,8 @@

-

- Run LLMs and AI on Any Cloud + Run AI on Any Infra — Unified, Faster, Cheaper

---- @@ -38,7 +37,6 @@ - [Nov, 2023] Using [**Axolotl**](https://github.com/OpenAccess-AI-Collective/axolotl) to finetune Mistral 7B on the cloud (on-demand and spot): [**example**](./llm/axolotl/) - [Sep, 2023] Case study: [**Covariant**](https://covariant.ai/) transformed AI development on the cloud using SkyPilot, delivering models 4x faster cost-effectively: [**read the case study**](https://blog.skypilot.co/covariant/) - [Aug, 2023] **Finetuning Cookbook**: Finetuning Llama 2 in your own cloud environment, privately: [**example**](./llm/vicuna-llama-2/), [**blog post**](https://blog.skypilot.co/finetuning-llama2-operational-guide/) -- [June, 2023] Serving LLM 24x Faster On the Cloud [**with vLLM**](https://vllm.ai/) and SkyPilot: [**example**](./llm/vllm/), [**blog post**](https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/)
Archived @@ -48,39 +46,42 @@ - [Dec, 2023] Using [**LoRAX**](https://github.com/predibase/lorax) to serve 1000s of finetuned LLMs on a single instance in the cloud: [**example**](./llm/lorax/) - [Sep, 2023] [**Mistral 7B**](https://mistral.ai/news/announcing-mistral-7b/), a high-quality open LLM, was released! Deploy via SkyPilot on any cloud: [**Mistral docs**](https://docs.mistral.ai/self-deployment/skypilot) - [July, 2023] Self-Hosted **Llama-2 Chatbot** on Any Cloud: [**example**](./llm/llama-2/) +- [June, 2023] Serving LLM 24x Faster On the Cloud [**with vLLM**](https://vllm.ai/) and SkyPilot: [**example**](./llm/vllm/), [**blog post**](https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/) - [April, 2023] [SkyPilot YAMLs](./llm/vicuna/) for finetuning & serving the [Vicuna LLM](https://lmsys.org/blog/2023-03-30-vicuna/) with a single command!
---- -SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution. +SkyPilot is a framework for running AI and batch workloads on any infra, offering unified execution, high cost savings, and high GPU availability. -SkyPilot **abstracts away cloud infra burdens**: -- Launch jobs & clusters on any cloud -- Easy scale-out: queue and run many jobs, automatically managed -- Easy access to object stores (S3, GCS, Azure, R2, IBM) +SkyPilot **abstracts away infra burdens**: +- Launch [dev clusters](https://skypilot.readthedocs.io/en/latest/examples/interactive-development.html), [jobs](https://skypilot.readthedocs.io/en/latest/examples/managed-jobs.html), and [serving](https://skypilot.readthedocs.io/en/latest/serving/sky-serve.html) on any infra +- Easy job management: queue, run, and auto-recover many jobs -SkyPilot **maximizes GPU availability for your jobs**: -* Provision in all zones/regions/clouds you have access to ([the _Sky_](https://arxiv.org/abs/2205.07147)), with automatic failover +SkyPilot **supports multiple clusters, clouds, and hardware** ([the Sky](https://arxiv.org/abs/2205.07147)): +- Bring your reserved GPUs, Kubernetes clusters, or 12+ clouds +- [Flexible provisioning](https://skypilot.readthedocs.io/en/latest/examples/auto-failover.html) of GPUs, TPUs, CPUs, with auto-retry -SkyPilot **cuts your cloud costs**: -* [Managed Spot](https://skypilot.readthedocs.io/en/latest/examples/spot-jobs.html): 3-6x cost savings using spot VMs, with auto-recovery from preemptions -* [Optimizer](https://skypilot.readthedocs.io/en/latest/examples/auto-failover.html): 2x cost savings by auto-picking the cheapest VM/zone/region/cloud -* [Autostop](https://skypilot.readthedocs.io/en/latest/reference/auto-stop.html): hands-free cleanup of idle clusters +SkyPilot **cuts your cloud costs & maximizes GPU availability**: +* [Autostop](https://skypilot.readthedocs.io/en/latest/reference/auto-stop.html): automatic cleanup of idle resources +* [Managed Spot](https://skypilot.readthedocs.io/en/latest/examples/managed-jobs.html): 3-6x cost savings using spot instances, with preemption auto-recovery +* [Optimizer](https://skypilot.readthedocs.io/en/latest/examples/auto-failover.html): 2x cost savings by auto-picking the cheapest & most available infra SkyPilot supports your existing GPU, TPU, and CPU workloads, with no code changes. Install with pip: ```bash -pip install -U "skypilot[aws,gcp,azure,oci,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp,kubernetes]" # choose your clouds +# Choose your clouds: +pip install -U "skypilot[kubernetes,aws,gcp,azure,oci,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp]" ``` To get the latest features and fixes, use the nightly build or [install from source](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html): ```bash -pip install "skypilot-nightly[aws,gcp,azure,oci,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp,kubernetes]" # choose your clouds +# Choose your clouds: +pip install "skypilot-nightly[kubernetes,aws,gcp,azure,oci,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp]" ``` -Current supported providers (AWS, Azure, GCP, OCI, Lambda Cloud, RunPod, Fluidstack, Paperspace, Cudo, IBM, Samsung, Cloudflare, any Kubernetes cluster): +[Current supported infra](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html) (Kubernetes; AWS, GCP, Azure, OCI, Lambda Cloud, Fluidstack, RunPod, Cudo, Paperspace, Cloudflare, Samsung, IBM, VMware vSphere):

@@ -155,6 +156,7 @@ To learn more, see our [Documentation](https://skypilot.readthedocs.io/en/latest Runnable examples: - LLMs on SkyPilot + - [Llama 3.1 finetuning](./llm/llama-3_1-finetuning/) and [serving](./llm/llama-3_1/) - [GPT-2 via `llm.c`](./llm/gpt-2/) - [Llama 3](./llm/llama-3/) - [Qwen](./llm/qwen/) @@ -177,6 +179,8 @@ Runnable examples: - Add yours here & see more in [`llm/`](./llm)! - Framework examples: [PyTorch DDP](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_torch.yaml), [DeepSpeed](./examples/deepspeed-multinode/sky.yaml), [JAX/Flax on TPU](https://github.com/skypilot-org/skypilot/blob/master/examples/tpu/tpuvm_mnist.yaml), [Stable Diffusion](https://github.com/skypilot-org/skypilot/tree/master/examples/stable_diffusion), [Detectron2](https://github.com/skypilot-org/skypilot/blob/master/examples/detectron2_docker.yaml), [Distributed](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_tf_app.py) [TensorFlow](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_app_storage.yaml), [Ray Train](examples/distributed_ray_train/ray_train.yaml), [NeMo](https://github.com/skypilot-org/skypilot/blob/master/examples/nemo/nemo.yaml), [programmatic grid search](https://github.com/skypilot-org/skypilot/blob/master/examples/huggingface_glue_imdb_grid_search_app.py), [Docker](https://github.com/skypilot-org/skypilot/blob/master/examples/docker/echo_app.yaml), [Cog](https://github.com/skypilot-org/skypilot/blob/master/examples/cog/), [Unsloth](https://github.com/skypilot-org/skypilot/blob/master/examples/unsloth/unsloth.yaml), [Ollama](https://github.com/skypilot-org/skypilot/blob/master/llm/ollama), [llm.c](https://github.com/skypilot-org/skypilot/tree/master/llm/gpt-2) and [many more (`examples/`)](./examples). +Case Studies and Integrations: [Community Spotlights](https://blog.skypilot.co/community/) + Follow updates: - [Twitter](https://twitter.com/skypilot_org) - [Slack](http://slack.skypilot.co) diff --git a/docs/source/docs/index.rst b/docs/source/docs/index.rst index 5b8d144af70..1ecdcfe1e98 100644 --- a/docs/source/docs/index.rst +++ b/docs/source/docs/index.rst @@ -2,52 +2,53 @@ Welcome to SkyPilot! ==================== .. image:: /_static/SkyPilot_wide_dark.svg - :width: 65% + :width: 50% :align: center :alt: SkyPilot :class: no-scaled-link, only-dark .. image:: /_static/SkyPilot_wide_light.svg - :width: 60% + :width: 50% :align: center :alt: SkyPilot :class: no-scaled-link, only-light + .. raw:: html +

+

+ Run AI on Any Infra — Unified, Faster, Cheaper +

Star Watch Fork -

-

- Run LLMs and AI on Any Cloud -

-SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution. +SkyPilot is a framework for running AI and batch workloads on any infra, offering unified execution, high cost savings, and high GPU availability. -SkyPilot **abstracts away cloud infra burdens**: +SkyPilot **abstracts away infra burdens**: -- Launch jobs & clusters on any cloud -- Easy scale-out: queue and run many jobs, automatically managed -- Easy access to object stores (S3, GCS, Azure, R2, IBM) +- Launch :ref:`dev clusters `, :ref:`jobs `, and :ref:`serving ` on any infra +- Easy job management: queue, run, and auto-recover many jobs -SkyPilot **maximizes GPU availability for your jobs**: +SkyPilot **supports multiple clusters, clouds, and hardware** (`the Sky `_): -* Provision in all zones/regions/clouds you have access to (`the Sky `_), with automatic failover +- Bring your reserved GPUs, Kubernetes clusters, or 12+ clouds +- :ref:`Flexible provisioning ` of GPUs, TPUs, CPUs, with auto-retry -SkyPilot **cuts your cloud costs**: +SkyPilot **cuts your cloud costs & maximizes GPU availability**: -* `Managed Spot `_: 3-6x cost savings using spot VMs, with auto-recovery from preemptions -* Optimizer: 2x cost savings by auto-picking the cheapest VM/zone/region/cloud -* `Autostop `_: hands-free cleanup of idle clusters +* :ref:`Autostop `: automatic cleanup of idle resources +* :ref:`Managed Spot `: 3-6x cost savings using spot instances, with preemption auto-recovery +* :ref:`Optimizer `: 2x cost savings by auto-picking the cheapest & most available infra SkyPilot supports your existing GPU, TPU, and CPU workloads, with no code changes. -Current supported providers (AWS, GCP, Azure, OCI, Lambda Cloud, RunPod, Fluidstack, Cudo, IBM, Samsung, Cloudflare, VMware vSphere, any Kubernetes cluster): +:ref:`Current supported infra ` (Kubernetes; AWS, GCP, Azure, OCI, Lambda Cloud, Fluidstack, RunPod, Cudo, Paperspace, Cloudflare, Samsung, IBM, VMware vSphere): .. raw:: html @@ -58,10 +59,20 @@ Current supported providers (AWS, GCP, Azure, OCI, Lambda Cloud, RunPod, Fluidst

-More Information --------------------------- +Ready to get started? +---------------------- -Tutorials: `SkyPilot Tutorials `_ +:ref:`Install SkyPilot ` in ~1 minute. Then, launch your first dev cluster in ~5 minutes in :ref:`Quickstart `. + +Everything is launched within your cloud accounts, VPCs, and cluster(s). + +Contact the SkyPilot team +--------------------------------- + +You can chat with the SkyPilot team and community on the `SkyPilot Slack `_. + +Learn more +-------------------------- Runnable examples: @@ -93,6 +104,10 @@ Runnable examples: * Framework examples: `PyTorch DDP `_, `DeepSpeed `_, `JAX/Flax on TPU `_, `Stable Diffusion `_, `Detectron2 `_, `Distributed `_ `TensorFlow `_, `NeMo `_, `programmatic grid search `_, `Docker `_, `Cog `_, `Unsloth `_, `Ollama `_, `llm.c `__ and `many more `_. +Case Studies and Integrations: `Community Spotlights `_ + +Tutorials: `SkyPilot Tutorials `_ + Follow updates: * `Twitter `_ @@ -106,10 +121,9 @@ Read the research: * `Sky Computing vision paper `_ (HotOS 2021) -Contents --------- .. toctree:: + :hidden: :maxdepth: 1 :caption: Getting Started @@ -120,6 +134,7 @@ Contents .. toctree:: + :hidden: :maxdepth: 1 :caption: Running Jobs @@ -130,6 +145,7 @@ Contents ../running-jobs/distributed-jobs .. toctree:: + :hidden: :maxdepth: 1 :caption: SkyServe: Model Serving @@ -138,6 +154,7 @@ Contents ../serving/service-yaml-spec .. toctree:: + :hidden: :maxdepth: 1 :caption: Cutting Cloud Costs @@ -146,6 +163,7 @@ Contents ../reference/benchmark/index .. toctree:: + :hidden: :maxdepth: 1 :caption: Using Data @@ -153,6 +171,7 @@ Contents ../reference/storage .. toctree:: + :hidden: :maxdepth: 1 :caption: User Guides @@ -165,6 +184,7 @@ Contents .. toctree:: + :hidden: :maxdepth: 1 :caption: Developer Guides @@ -172,6 +192,7 @@ Contents Guide: Adding a New Cloud .. toctree:: + :hidden: :maxdepth: 1 :caption: Cloud Admin and Usage @@ -180,6 +201,7 @@ Contents ../cloud-setup/quota .. toctree:: + :hidden: :maxdepth: 1 :caption: References diff --git a/docs/source/examples/interactive-development.rst b/docs/source/examples/interactive-development.rst index f9c4b5c4803..cc50f8e6ea8 100644 --- a/docs/source/examples/interactive-development.rst +++ b/docs/source/examples/interactive-development.rst @@ -1,3 +1,5 @@ +.. _dev-cluster: + Start a Development Cluster =========================== @@ -90,7 +92,7 @@ SSH SkyPilot will automatically configure the SSH setting for a cluster, so that users can connect to the cluster with the cluster name: .. code-block:: bash - + ssh dev @@ -99,7 +101,7 @@ SkyPilot will automatically configure the SSH setting for a cluster, so that use VSCode ~~~~~~ -A common use case for interactive development is to connect a local IDE to a remote cluster and directly edit code that lives on the cluster. +A common use case for interactive development is to connect a local IDE to a remote cluster and directly edit code that lives on the cluster. This is supported by simply connecting VSCode to the cluster with the cluster name: #. Click on the top bar, type: :code:`> remote-ssh`, and select :code:`Remote-SSH: Connect Current Window to Host...`