From e40b22139cff9275ab5f93decba6e41b52dab525 Mon Sep 17 00:00:00 2001
From: Zongheng Yang <zongheng.y@gmail.com>
Date: Wed, 31 Jul 2024 13:28:17 -0700
Subject: [PATCH] Docs revamp: polishing landing docs. (#3797)

* Docs revamp: polishing landing docs.

* Add a note

* Fix order
---
 README.md                                     | 38 ++++++-----
 docs/source/docs/index.rst                    | 68 ++++++++++++-------
 .../examples/interactive-development.rst      |  6 +-
 3 files changed, 70 insertions(+), 42 deletions(-)
diff --git a/README.md b/README.md
index a6f1df49c91..42902017d75 100644
--- a/README.md
+++ b/README.md
@@ -20,9 +20,8 @@
 
 </p>
 
-
 <h3 align="center">
-    Run LLMs and AI on Any Cloud
+    Run AI on Any Infra — Unified, Faster, Cheaper
 </h3>
 
 ----
@@ -38,7 +37,6 @@
 - [Nov, 2023] Using [**Axolotl**](https://github.com/OpenAccess-AI-Collective/axolotl) to finetune Mistral 7B on the cloud (on-demand and spot): [**example**](./llm/axolotl/)
 - [Sep, 2023] Case study: [**Covariant**](https://covariant.ai/) transformed AI development on the cloud using SkyPilot, delivering models 4x faster cost-effectively: [**read the case study**](https://blog.skypilot.co/covariant/)
 - [Aug, 2023] **Finetuning Cookbook**: Finetuning Llama 2 in your own cloud environment, privately: [**example**](./llm/vicuna-llama-2/), [**blog post**](https://blog.skypilot.co/finetuning-llama2-operational-guide/)
-- [June, 2023] Serving LLM 24x Faster On the Cloud [**with vLLM**](https://vllm.ai/) and SkyPilot: [**example**](./llm/vllm/), [**blog post**](https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/)
 
 <details>
   <summary>Archived</summary>
@@ -48,39 +46,42 @@
 - [Dec, 2023] Using [**LoRAX**](https://github.com/predibase/lorax) to serve 1000s of finetuned LLMs on a single instance in the cloud: [**example**](./llm/lorax/)
 - [Sep, 2023] [**Mistral 7B**](https://mistral.ai/news/announcing-mistral-7b/), a high-quality open LLM, was released! Deploy via SkyPilot on any cloud: [**Mistral docs**](https://docs.mistral.ai/self-deployment/skypilot)
 - [July, 2023] Self-Hosted **Llama-2 Chatbot** on Any Cloud: [**example**](./llm/llama-2/)
+- [June, 2023] Serving LLM 24x Faster On the Cloud [**with vLLM**](https://vllm.ai/) and SkyPilot: [**example**](./llm/vllm/), [**blog post**](https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/)
 - [April, 2023] [SkyPilot YAMLs](./llm/vicuna/) for finetuning & serving the [Vicuna LLM](https://lmsys.org/blog/2023-03-30-vicuna/) with a single command!
 
 </details>
 
 ----
 
-SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution.
+SkyPilot is a framework for running AI and batch workloads on any infra, offering unified execution, high cost savings, and high GPU availability.
 
-SkyPilot **abstracts away cloud infra burdens**:
-- Launch jobs & clusters on any cloud
-- Easy scale-out: queue and run many jobs, automatically managed
-- Easy access to object stores (S3, GCS, Azure, R2, IBM)
+SkyPilot **abstracts away infra burdens**:
+- Launch [dev clusters](https://skypilot.readthedocs.io/en/latest/examples/interactive-development.html), [jobs](https://skypilot.readthedocs.io/en/latest/examples/managed-jobs.html), and [serving](https://skypilot.readthedocs.io/en/latest/serving/sky-serve.html) on any infra
+- Easy job management: queue, run, and auto-recover many jobs
 
-SkyPilot **maximizes GPU availability for your jobs**:
-* Provision in all zones/regions/clouds you have access to ([the _Sky_](https://arxiv.org/abs/2205.07147)), with automatic failover
+SkyPilot **supports multiple clusters, clouds, and hardware** ([the Sky](https://arxiv.org/abs/2205.07147)):
+- Bring your reserved GPUs, Kubernetes clusters, or 12+ clouds
+- [Flexible provisioning](https://skypilot.readthedocs.io/en/latest/examples/auto-failover.html) of GPUs, TPUs, CPUs, with auto-retry
 
-SkyPilot **cuts your cloud costs**:
-* [Managed Spot](https://skypilot.readthedocs.io/en/latest/examples/spot-jobs.html): 3-6x cost savings using spot VMs, with auto-recovery from preemptions
-* [Optimizer](https://skypilot.readthedocs.io/en/latest/examples/auto-failover.html): 2x cost savings by auto-picking the cheapest VM/zone/region/cloud
-* [Autostop](https://skypilot.readthedocs.io/en/latest/reference/auto-stop.html): hands-free cleanup of idle clusters
+SkyPilot **cuts your cloud costs & maximizes GPU availability**:
+* [Autostop](https://skypilot.readthedocs.io/en/latest/reference/auto-stop.html): automatic cleanup of idle resources
+* [Managed Spot](https://skypilot.readthedocs.io/en/latest/examples/managed-jobs.html): 3-6x cost savings using spot instances, with preemption auto-recovery
+* [Optimizer](https://skypilot.readthedocs.io/en/latest/examples/auto-failover.html): 2x cost savings by auto-picking the cheapest & most available infra
 
 SkyPilot supports your existing GPU, TPU, and CPU workloads, with no code changes.
 
 Install with pip:
 ```bash
-pip install -U "skypilot[aws,gcp,azure,oci,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp,kubernetes]"  # choose your clouds
+# Choose your clouds:
+pip install -U "skypilot[kubernetes,aws,gcp,azure,oci,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp]"
 ```
 To get the latest features and fixes, use the nightly build or [install from source](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html):
 ```bash
-pip install "skypilot-nightly[aws,gcp,azure,oci,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp,kubernetes]"  # choose your clouds
+# Choose your clouds:
+pip install "skypilot-nightly[kubernetes,aws,gcp,azure,oci,lambda,runpod,fluidstack,paperspace,cudo,ibm,scp]"
 ```
 
-Current supported providers (AWS, Azure, GCP, OCI, Lambda Cloud, RunPod, Fluidstack, Paperspace, Cudo, IBM, Samsung, Cloudflare, any Kubernetes cluster):
+[Current supported infra](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html) (Kubernetes; AWS, GCP, Azure, OCI, Lambda Cloud, Fluidstack, RunPod, Cudo, Paperspace, Cloudflare, Samsung, IBM, VMware vSphere):
 <p align="center">
   <picture>
     <source media="(prefers-color-scheme: dark)" srcset="https://raw.githubusercontent.com/skypilot-org/skypilot/master/docs/source/images/cloud-logos-dark.png">
@@ -155,6 +156,7 @@ To learn more, see our [Documentation](https://skypilot.readthedocs.io/en/latest
 <!-- Keep this section in sync with index.rst in SkyPilot Docs -->
 Runnable examples:
 - LLMs on SkyPilot
+  - [Llama 3.1 finetuning](./llm/llama-3_1-finetuning/) and [serving](./llm/llama-3_1/)
   - [GPT-2 via `llm.c`](./llm/gpt-2/)
   - [Llama 3](./llm/llama-3/)
   - [Qwen](./llm/qwen/)
@@ -177,6 +179,8 @@ Runnable examples:
   - Add yours here & see more in [`llm/`](./llm)!
 - Framework examples: [PyTorch DDP](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_torch.yaml), [DeepSpeed](./examples/deepspeed-multinode/sky.yaml), [JAX/Flax on TPU](https://github.com/skypilot-org/skypilot/blob/master/examples/tpu/tpuvm_mnist.yaml), [Stable Diffusion](https://github.com/skypilot-org/skypilot/tree/master/examples/stable_diffusion), [Detectron2](https://github.com/skypilot-org/skypilot/blob/master/examples/detectron2_docker.yaml), [Distributed](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_tf_app.py) [TensorFlow](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_app_storage.yaml), [Ray Train](examples/distributed_ray_train/ray_train.yaml), [NeMo](https://github.com/skypilot-org/skypilot/blob/master/examples/nemo/nemo.yaml), [programmatic grid search](https://github.com/skypilot-org/skypilot/blob/master/examples/huggingface_glue_imdb_grid_search_app.py), [Docker](https://github.com/skypilot-org/skypilot/blob/master/examples/docker/echo_app.yaml), [Cog](https://github.com/skypilot-org/skypilot/blob/master/examples/cog/), [Unsloth](https://github.com/skypilot-org/skypilot/blob/master/examples/unsloth/unsloth.yaml), [Ollama](https://github.com/skypilot-org/skypilot/blob/master/llm/ollama), [llm.c](https://github.com/skypilot-org/skypilot/tree/master/llm/gpt-2) and [many more (`examples/`)](./examples).
 
+Case Studies and Integrations: [Community Spotlights](https://blog.skypilot.co/community/)
+
 Follow updates:
 - [Twitter](https://twitter.com/skypilot_org)
 - [Slack](http://slack.skypilot.co)
diff --git a/docs/source/docs/index.rst b/docs/source/docs/index.rst
index 5b8d144af70..1ecdcfe1e98 100644
--- a/docs/source/docs/index.rst
+++ b/docs/source/docs/index.rst
@@ -2,52 +2,53 @@ Welcome to SkyPilot!
 ====================
 
 .. image:: /_static/SkyPilot_wide_dark.svg
-  :width: 65%
+  :width: 50%
   :align: center
   :alt: SkyPilot
   :class: no-scaled-link, only-dark
 .. image:: /_static/SkyPilot_wide_light.svg
-  :width: 60%
+  :width: 50%
   :align: center
   :alt: SkyPilot
   :class: no-scaled-link, only-light
 
+
 .. raw:: html
 
+   <p></p>
+   <p style="text-align:center">
+   <strong>Run AI on Any Infra</strong> — Unified, Faster, Cheaper
+   </p>
    <p style="text-align:center">
    <a class="github-button" href="https://github.com/skypilot-org/skypilot" data-show-count="true" data-size="large" aria-label="Star skypilot-org/skypilot on GitHub">Star</a>
    <a class="github-button" href="https://github.com/skypilot-org/skypilot/subscription" data-icon="octicon-eye" data-size="large" aria-label="Watch skypilot-org/skypilot on GitHub">Watch</a>
    <a class="github-button" href="https://github.com/skypilot-org/skypilot/fork" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork skypilot-org/skypilot on GitHub">Fork</a>
-   <br>
    <a class="reference external image-reference" style="vertical-align:9.5px" href="http://slack.skypilot.co"><img src="https://img.shields.io/badge/SkyPilot-Join%20Slack-blue?logo=slack" style="height:27px"></a>
    <script async defer src="https://buttons.github.io/buttons.js"></script>
    </p>
 
-   <p style="text-align:center">
-   <strong>Run LLMs and AI on Any Cloud</strong>
-   </p>
 
-SkyPilot is a framework for running LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution.
+SkyPilot is a framework for running AI and batch workloads on any infra, offering unified execution, high cost savings, and high GPU availability.
 
-SkyPilot **abstracts away cloud infra burdens**:
+SkyPilot **abstracts away infra burdens**:
 
-- Launch jobs & clusters on any cloud
-- Easy scale-out: queue and run many jobs, automatically managed
-- Easy access to object stores (S3, GCS, Azure, R2, IBM)
+- Launch :ref:`dev clusters <dev-cluster>`, :ref:`jobs <managed-jobs>`, and :ref:`serving <sky-serve>` on any infra
+- Easy job management: queue, run, and auto-recover many jobs
 
-SkyPilot **maximizes GPU availability for your jobs**:
+SkyPilot **supports multiple clusters, clouds, and hardware** (`the Sky <https://arxiv.org/abs/2205.07147>`_):
 
-* Provision in all zones/regions/clouds you have access to (`the Sky <https://arxiv.org/abs/2205.07147>`_), with automatic failover
+- Bring your reserved GPUs, Kubernetes clusters, or 12+ clouds
+- :ref:`Flexible provisioning <auto-failover>` of GPUs, TPUs, CPUs, with auto-retry
 
-SkyPilot **cuts your cloud costs**:
+SkyPilot **cuts your cloud costs & maximizes GPU availability**:
 
-* `Managed Spot <https://skypilot.readthedocs.io/en/latest/examples/spot-jobs.html>`_: 3-6x cost savings using spot VMs, with auto-recovery from preemptions
-* Optimizer: 2x cost savings by auto-picking the cheapest VM/zone/region/cloud
-* `Autostop <https://skypilot.readthedocs.io/en/latest/reference/auto-stop.html>`_: hands-free cleanup of idle clusters
+* :ref:`Autostop <auto-stop>`: automatic cleanup of idle resources
+* :ref:`Managed Spot <managed-jobs>`: 3-6x cost savings using spot instances, with preemption auto-recovery
+* :ref:`Optimizer <auto-failover>`: 2x cost savings by auto-picking the cheapest & most available infra
 
 SkyPilot supports your existing GPU, TPU, and CPU workloads, with no code changes.
 
-Current supported providers (AWS, GCP, Azure, OCI, Lambda Cloud, RunPod, Fluidstack, Cudo, IBM, Samsung, Cloudflare, VMware vSphere, any Kubernetes cluster):
+:ref:`Current supported infra <installation>` (Kubernetes; AWS, GCP, Azure, OCI, Lambda Cloud, Fluidstack, RunPod, Cudo, Paperspace, Cloudflare, Samsung, IBM, VMware vSphere):
 
 .. raw:: html
 
@@ -58,10 +59,20 @@ Current supported providers (AWS, GCP, Azure, OCI, Lambda Cloud, RunPod, Fluidst
    </picture>
    </p>
 
-More Information
---------------------------
+Ready to get started?
+----------------------
 
-Tutorials: `SkyPilot Tutorials <https://github.com/skypilot-org/skypilot-tutorial>`_
+:ref:`Install SkyPilot <installation>` in ~1 minute. Then, launch your first dev cluster in ~5 minutes in :ref:`Quickstart <quickstart>`.
+
+Everything is launched within your cloud accounts, VPCs, and cluster(s).
+
+Contact the SkyPilot team
+---------------------------------
+
+You can chat with the SkyPilot team and community on the `SkyPilot Slack <http://slack.skypilot.co>`_.
+
+Learn more
+--------------------------
 
 Runnable examples:
 
@@ -93,6 +104,10 @@ Runnable examples:
 
 * Framework examples: `PyTorch DDP <https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_torch.yaml>`_, `DeepSpeed <https://github.com/skypilot-org/skypilot/blob/master/examples/deepspeed-multinode/sky.yaml>`_, `JAX/Flax on TPU <https://github.com/skypilot-org/skypilot/blob/master/examples/tpu/tpuvm_mnist.yaml>`_, `Stable Diffusion <https://github.com/skypilot-org/skypilot/tree/master/examples/stable_diffusion>`_, `Detectron2 <https://github.com/skypilot-org/skypilot/blob/master/examples/detectron2_docker.yaml>`_, `Distributed <https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_tf_app.py>`_ `TensorFlow <https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_app_storage.yaml>`_, `NeMo <https://github.com/skypilot-org/skypilot/blob/master/examples/nemo/nemo_gpt_train.yaml>`_, `programmatic grid search <https://github.com/skypilot-org/skypilot/blob/master/examples/huggingface_glue_imdb_grid_search_app.py>`_, `Docker <https://github.com/skypilot-org/skypilot/blob/master/examples/docker/echo_app.yaml>`_, `Cog <https://github.com/skypilot-org/skypilot/blob/master/examples/cog/>`_, `Unsloth <https://github.com/skypilot-org/skypilot/blob/master/examples/unsloth/unsloth.yaml>`_, `Ollama <https://github.com/skypilot-org/skypilot/blob/master/llm/ollama>`_, `llm.c <https://github.com/skypilot-org/skypilot/tree/master/llm/gpt-2>`__ and `many more <https://github.com/skypilot-org/skypilot/tree/master/examples>`_.
 
+Case Studies and Integrations: `Community Spotlights <https://blog.skypilot.co/community/>`_
+
+Tutorials: `SkyPilot Tutorials <https://github.com/skypilot-org/skypilot-tutorial>`_
+
 Follow updates:
 
 * `Twitter <https://twitter.com/skypilot_org>`_
@@ -106,10 +121,9 @@ Read the research:
 * `Sky Computing vision paper <https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s02-stoica.pdf>`_ (HotOS 2021)
 
 
-Contents
---------
 
 .. toctree::
+   :hidden:
    :maxdepth: 1
    :caption: Getting Started
 
@@ -120,6 +134,7 @@ Contents
 
 
 .. toctree::
+   :hidden:
    :maxdepth: 1
    :caption: Running Jobs
 
@@ -130,6 +145,7 @@ Contents
    ../running-jobs/distributed-jobs
 
 .. toctree::
+   :hidden:
    :maxdepth: 1
    :caption: SkyServe: Model Serving
 
@@ -138,6 +154,7 @@ Contents
    ../serving/service-yaml-spec
 
 .. toctree::
+   :hidden:
    :maxdepth: 1
    :caption: Cutting Cloud Costs
 
@@ -146,6 +163,7 @@ Contents
    ../reference/benchmark/index
 
 .. toctree::
+   :hidden:
    :maxdepth: 1
    :caption: Using Data
 
@@ -153,6 +171,7 @@ Contents
    ../reference/storage
 
 .. toctree::
+   :hidden:
    :maxdepth: 1
    :caption: User Guides
 
@@ -165,6 +184,7 @@ Contents
 
 
 .. toctree::
+   :hidden:
    :maxdepth: 1
    :caption: Developer Guides
 
@@ -172,6 +192,7 @@ Contents
    Guide: Adding a New Cloud <https://docs.google.com/document/d/1oWox3qb3Kz3wXXSGg9ZJWwijoa99a3PIQUHBR8UgEGs/edit?usp=sharing>
 
 .. toctree::
+   :hidden:
    :maxdepth: 1
    :caption: Cloud Admin and Usage
 
@@ -180,6 +201,7 @@ Contents
    ../cloud-setup/quota
 
 .. toctree::
+   :hidden:
    :maxdepth: 1
    :caption: References
 
diff --git a/docs/source/examples/interactive-development.rst b/docs/source/examples/interactive-development.rst
index f9c4b5c4803..cc50f8e6ea8 100644
--- a/docs/source/examples/interactive-development.rst
+++ b/docs/source/examples/interactive-development.rst
@@ -1,3 +1,5 @@
+.. _dev-cluster:
+
 Start a Development Cluster
 ===========================
 
@@ -90,7 +92,7 @@ SSH
 SkyPilot will automatically configure the SSH setting for a cluster, so that users can connect to the cluster with the cluster name:
 
 .. code-block:: bash
-  
+
   ssh dev
 
 
@@ -99,7 +101,7 @@ SkyPilot will automatically configure the SSH setting for a cluster, so that use
 VSCode
 ~~~~~~
 
-A common use case for interactive development is to connect a local IDE to a remote cluster and directly edit code that lives on the cluster. 
+A common use case for interactive development is to connect a local IDE to a remote cluster and directly edit code that lives on the cluster.
 This is supported by simply connecting VSCode to the cluster with the cluster name:
 
 #. Click on the top bar, type: :code:`> remote-ssh`, and select :code:`Remote-SSH: Connect Current Window to Host...`