Skip to content

Commit

Permalink
[Docs] Add Llama 3 news. (#3486)
Browse files Browse the repository at this point in the history
  • Loading branch information
concretevitamin authored Apr 26, 2024
1 parent d0a1a86 commit 889adce
Show file tree
Hide file tree
Showing 3 changed files with 44 additions and 41 deletions.
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,22 +27,23 @@

----
:fire: *News* :fire:
- [Apr, 2024] Serve and finetune [**Llama 3**](https://skypilot.readthedocs.io/en/latest/gallery/llms/llama-3.html) on any cloud or Kubernetes: [**example**](./llm/llama-3/)
- [Apr, 2024] Using [**Ollama**](https://github.com/ollama/ollama) to deploy quantized LLMs on CPUs and GPUs: [**example**](./llm/ollama/)
- [Mar, 2024] Serve and deploy [**Databricks DBRX**](https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm) on your infra: [**example**](./llm/dbrx/)
- [Feb, 2024] Deploying and scaling [**Gemma**](https://blog.google/technology/developers/gemma-open-models/) with SkyServe: [**example**](./llm/gemma/)
- [Feb, 2024] Speed up your LLM deployments with [**SGLang**](https://github.com/sgl-project/sglang) for 5x throughput on SkyServe: [**example**](./llm/sglang/)
- [Feb, 2024] Serving [**Code Llama 70B**](https://ai.meta.com/blog/code-llama-large-language-model-coding/) with vLLM and SkyServe: [**example**](./llm/codellama/)
- [Dec, 2023] Using [**LoRAX**](https://github.com/predibase/lorax) to serve 1000s of finetuned LLMs on a single instance in the cloud: [**example**](./llm/lorax/)
- [Dec, 2023] [**Mixtral 8x7B**](https://mistral.ai/news/mixtral-of-experts/), a high quality sparse mixture-of-experts model, was released by Mistral AI! Deploy via SkyPilot on any cloud: [**example**](./llm/mixtral/)
- [Nov, 2023] Using [**Axolotl**](https://github.com/OpenAccess-AI-Collective/axolotl) to finetune Mistral 7B on the cloud (on-demand and spot): [**example**](./llm/axolotl/)
- [Sep, 2023] [**Mistral 7B**](https://mistral.ai/news/announcing-mistral-7b/), a high-quality open LLM, was released! Deploy via SkyPilot on any cloud: [**Mistral docs**](https://docs.mistral.ai/self-deployment/skypilot)
- [Sep, 2023] Case study: [**Covariant**](https://covariant.ai/) transformed AI development on the cloud using SkyPilot, delivering models 4x faster cost-effectively: [**read the case study**](https://blog.skypilot.co/covariant/)
- [Aug, 2023] Cookbook: Finetuning Llama 2 in your own cloud environment, privately: [**example**](./llm/vicuna-llama-2/), [**blog post**](https://blog.skypilot.co/finetuning-llama2-operational-guide/)
- [Aug, 2023] **Finetuning Cookbook**: Finetuning Llama 2 in your own cloud environment, privately: [**example**](./llm/vicuna-llama-2/), [**blog post**](https://blog.skypilot.co/finetuning-llama2-operational-guide/)
- [June, 2023] Serving LLM 24x Faster On the Cloud [**with vLLM**](https://vllm.ai/) and SkyPilot: [**example**](./llm/vllm/), [**blog post**](https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/)

<details>
<summary>Archived</summary>

- [Dec, 2023] Using [**LoRAX**](https://github.com/predibase/lorax) to serve 1000s of finetuned LLMs on a single instance in the cloud: [**example**](./llm/lorax/)
- [Sep, 2023] [**Mistral 7B**](https://mistral.ai/news/announcing-mistral-7b/), a high-quality open LLM, was released! Deploy via SkyPilot on any cloud: [**Mistral docs**](https://docs.mistral.ai/self-deployment/skypilot)
- [July, 2023] Self-Hosted **Llama-2 Chatbot** on Any Cloud: [**example**](./llm/llama-2/)
- [April, 2023] [SkyPilot YAMLs](./llm/vicuna/) for finetuning & serving the [Vicuna LLM](https://lmsys.org/blog/2023-03-30-vicuna/) with a single command!

Expand Down Expand Up @@ -151,6 +152,7 @@ To learn more, see our [Documentation](https://skypilot.readthedocs.io/en/latest
<!-- Keep this section in sync with index.rst in SkyPilot Docs -->
Runnable examples:
- LLMs on SkyPilot
- [Llama 3](./llm/llama-3/)
- [Databricks DBRX](./llm/dbrx/)
- [Gemma](./llm/gemma/)
- [Mixtral 8x7B](./llm/mixtral/); [Mistral 7B](https://docs.mistral.ai/self-deployment/skypilot/) (from official Mistral team)
Expand Down
1 change: 1 addition & 0 deletions docs/source/docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ Runnable examples:
* **LLMs on SkyPilot**

* `Llama 3 <https://github.com/skypilot-org/skypilot/tree/master/llm/llama-3>`_
* `Databricks DBRX <https://github.com/skypilot-org/skypilot/tree/master/llm/dbrx>`_
* `Gemma <https://github.com/skypilot-org/skypilot/tree/master/llm/gemma>`_
* `Mixtral 8x7B <https://github.com/skypilot-org/skypilot/tree/master/llm/mixtral>`_; `Mistral 7B <https://docs.mistral.ai/self-deployment/skypilot>`_ (from official Mistral team)
Expand Down
76 changes: 38 additions & 38 deletions llm/llama-3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ run: |
</details>
You can also get the full YAML file [here](https://github.com/skypilot-org/skypilot/tree/master/llm/llama3/llama3.yaml).
You can also get the full YAML file [here](https://github.com/skypilot-org/skypilot/blob/master/llm/llama-3/llama3.yaml).
## Serving Llama-3: single instance
Expand All @@ -130,25 +130,25 @@ HF_TOKEN=xxx sky launch llama3.yaml -c llama3 --env HF_TOKEN
I 04-18 16:31:30 optimizer.py:693] == Optimizer ==
I 04-18 16:31:30 optimizer.py:704] Target: minimizing cost
I 04-18 16:31:30 optimizer.py:716] Estimated cost: $1.2 / hour
I 04-18 16:31:30 optimizer.py:716]
I 04-18 16:31:30 optimizer.py:716]
I 04-18 16:31:30 optimizer.py:839] Considered resources (1 node):
I 04-18 16:31:30 optimizer.py:909] -----------------------------------------------------------------------------------------------------------------
I 04-18 16:31:30 optimizer.py:909] CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
I 04-18 16:31:30 optimizer.py:909] CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
I 04-18 16:31:30 optimizer.py:909] -----------------------------------------------------------------------------------------------------------------
I 04-18 16:31:30 optimizer.py:909] Azure Standard_NC48ads_A100_v4[Spot] 48 440 A100-80GB:2 eastus 1.22 ✔
I 04-18 16:31:30 optimizer.py:909] AWS g6.48xlarge[Spot] 192 768 L4:8 us-east-1b 1.43
I 04-18 16:31:30 optimizer.py:909] Azure Standard_NC96ads_A100_v4[Spot] 96 880 A100-80GB:4 eastus 2.44
I 04-18 16:31:30 optimizer.py:909] AWS g5.48xlarge[Spot] 192 768 A10G:8 us-east-2b 2.45
I 04-18 16:31:30 optimizer.py:909] GCP g2-standard-96[Spot] 96 384 L4:8 asia-east1-a 2.49
I 04-18 16:31:30 optimizer.py:909] Azure Standard_ND96asr_v4[Spot] 96 900 A100:8 eastus 4.82
I 04-18 16:31:30 optimizer.py:909] GCP a2-highgpu-4g[Spot] 48 340 A100:4 europe-west4-a 4.82
I 04-18 16:31:30 optimizer.py:909] AWS p4d.24xlarge[Spot] 96 1152 A100:8 us-east-2b 4.90
I 04-18 16:31:30 optimizer.py:909] Azure Standard_ND96amsr_A100_v4[Spot] 96 1924 A100-80GB:8 southcentralus 5.17
I 04-18 16:31:30 optimizer.py:909] GCP a2-ultragpu-4g[Spot] 48 680 A100-80GB:4 us-east4-c 7.39
I 04-18 16:31:30 optimizer.py:909] GCP a2-highgpu-8g[Spot] 96 680 A100:8 europe-west4-a 9.65
I 04-18 16:31:30 optimizer.py:909] GCP a2-ultragpu-8g[Spot] 96 1360 A100-80GB:8 us-east4-c 14.79
I 04-18 16:31:30 optimizer.py:909] Azure Standard_NC48ads_A100_v4[Spot] 48 440 A100-80GB:2 eastus 1.22 ✔
I 04-18 16:31:30 optimizer.py:909] AWS g6.48xlarge[Spot] 192 768 L4:8 us-east-1b 1.43
I 04-18 16:31:30 optimizer.py:909] Azure Standard_NC96ads_A100_v4[Spot] 96 880 A100-80GB:4 eastus 2.44
I 04-18 16:31:30 optimizer.py:909] AWS g5.48xlarge[Spot] 192 768 A10G:8 us-east-2b 2.45
I 04-18 16:31:30 optimizer.py:909] GCP g2-standard-96[Spot] 96 384 L4:8 asia-east1-a 2.49
I 04-18 16:31:30 optimizer.py:909] Azure Standard_ND96asr_v4[Spot] 96 900 A100:8 eastus 4.82
I 04-18 16:31:30 optimizer.py:909] GCP a2-highgpu-4g[Spot] 48 340 A100:4 europe-west4-a 4.82
I 04-18 16:31:30 optimizer.py:909] AWS p4d.24xlarge[Spot] 96 1152 A100:8 us-east-2b 4.90
I 04-18 16:31:30 optimizer.py:909] Azure Standard_ND96amsr_A100_v4[Spot] 96 1924 A100-80GB:8 southcentralus 5.17
I 04-18 16:31:30 optimizer.py:909] GCP a2-ultragpu-4g[Spot] 48 680 A100-80GB:4 us-east4-c 7.39
I 04-18 16:31:30 optimizer.py:909] GCP a2-highgpu-8g[Spot] 96 680 A100:8 europe-west4-a 9.65
I 04-18 16:31:30 optimizer.py:909] GCP a2-ultragpu-8g[Spot] 96 1360 A100-80GB:8 us-east4-c 14.79
I 04-18 16:31:30 optimizer.py:909] -----------------------------------------------------------------------------------------------------------------
I 04-18 16:31:30 optimizer.py:909]
I 04-18 16:31:30 optimizer.py:909]
...
```

Expand All @@ -165,31 +165,31 @@ $ HF_TOKEN=xxx sky launch llama3.yaml -c llama3 --env HF_TOKEN --no-use-spot
I 04-18 16:34:13 optimizer.py:693] == Optimizer ==
I 04-18 16:34:13 optimizer.py:704] Target: minimizing cost
I 04-18 16:34:13 optimizer.py:716] Estimated cost: $5.0 / hour
I 04-18 16:34:13 optimizer.py:716]
I 04-18 16:34:13 optimizer.py:716]
I 04-18 16:34:13 optimizer.py:839] Considered resources (1 node):
I 04-18 16:34:13 optimizer.py:909] ------------------------------------------------------------------------------------------------------------------
I 04-18 16:34:13 optimizer.py:909] CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
I 04-18 16:34:13 optimizer.py:909] CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
I 04-18 16:34:13 optimizer.py:909] ------------------------------------------------------------------------------------------------------------------
I 04-18 16:34:13 optimizer.py:909] Kubernetes 32CPU--512GB--8A100 32 512 A100:8 kubernetes 0.00 ✔
I 04-18 16:34:13 optimizer.py:909] Fluidstack recE2ZDQmqR9HBKYs5xSnjtPw 64 240 A100-80GB:2 generic_1_canada 4.96
I 04-18 16:34:13 optimizer.py:909] Fluidstack recUiB2e6s3XDxwE9 60 440 A100:4 calgary_1_canada 5.88
I 04-18 16:34:13 optimizer.py:909] Azure Standard_NC48ads_A100_v4 48 440 A100-80GB:2 eastus 7.35
I 04-18 16:34:13 optimizer.py:909] GCP g2-standard-96 96 384 L4:8 us-east4-a 7.98
I 04-18 16:34:13 optimizer.py:909] Fluidstack recWGm4oJ9AB3XVPxzRaujgbx 126 480 A100-80GB:4 generic_1_canada 9.89
I 04-18 16:34:13 optimizer.py:909] Paperspace A100-80Gx4 46 320 A100-80GB:4 East Coast (NY2) 12.72
I 04-18 16:34:13 optimizer.py:909] AWS g6.48xlarge 192 768 L4:8 us-east-1 13.35
I 04-18 16:34:13 optimizer.py:909] GCP a2-highgpu-4g 48 340 A100:4 us-central1-a 14.69
I 04-18 16:34:13 optimizer.py:909] Azure Standard_NC96ads_A100_v4 96 880 A100-80GB:4 eastus 14.69
I 04-18 16:34:13 optimizer.py:909] AWS g5.48xlarge 192 768 A10G:8 us-east-1 16.29
I 04-18 16:34:13 optimizer.py:909] Fluidstack recUYj6oGJCvAvCXC7KQo5Fc7 252 960 A100-80GB:8 generic_1_canada 19.79
I 04-18 16:34:13 optimizer.py:909] GCP a2-ultragpu-4g 48 680 A100-80GB:4 us-central1-a 20.11
I 04-18 16:34:13 optimizer.py:909] Paperspace A100-80Gx8 96 640 A100-80GB:8 East Coast (NY2) 25.44
I 04-18 16:34:13 optimizer.py:909] Azure Standard_ND96asr_v4 96 900 A100:8 eastus 27.20
I 04-18 16:34:13 optimizer.py:909] GCP a2-highgpu-8g 96 680 A100:8 us-central1-a 29.39
I 04-18 16:34:13 optimizer.py:909] Azure Standard_ND96amsr_A100_v4 96 1924 A100-80GB:8 eastus 32.77
I 04-18 16:34:13 optimizer.py:909] AWS p4d.24xlarge 96 1152 A100:8 us-east-1 32.77
I 04-18 16:34:13 optimizer.py:909] GCP a2-ultragpu-8g 96 1360 A100-80GB:8 us-central1-a 40.22
I 04-18 16:34:13 optimizer.py:909] AWS p4de.24xlarge 96 1152 A100-80GB:8 us-east-1 40.97
I 04-18 16:34:13 optimizer.py:909] Kubernetes 32CPU--512GB--8A100 32 512 A100:8 kubernetes 0.00 ✔
I 04-18 16:34:13 optimizer.py:909] Fluidstack recE2ZDQmqR9HBKYs5xSnjtPw 64 240 A100-80GB:2 generic_1_canada 4.96
I 04-18 16:34:13 optimizer.py:909] Fluidstack recUiB2e6s3XDxwE9 60 440 A100:4 calgary_1_canada 5.88
I 04-18 16:34:13 optimizer.py:909] Azure Standard_NC48ads_A100_v4 48 440 A100-80GB:2 eastus 7.35
I 04-18 16:34:13 optimizer.py:909] GCP g2-standard-96 96 384 L4:8 us-east4-a 7.98
I 04-18 16:34:13 optimizer.py:909] Fluidstack recWGm4oJ9AB3XVPxzRaujgbx 126 480 A100-80GB:4 generic_1_canada 9.89
I 04-18 16:34:13 optimizer.py:909] Paperspace A100-80Gx4 46 320 A100-80GB:4 East Coast (NY2) 12.72
I 04-18 16:34:13 optimizer.py:909] AWS g6.48xlarge 192 768 L4:8 us-east-1 13.35
I 04-18 16:34:13 optimizer.py:909] GCP a2-highgpu-4g 48 340 A100:4 us-central1-a 14.69
I 04-18 16:34:13 optimizer.py:909] Azure Standard_NC96ads_A100_v4 96 880 A100-80GB:4 eastus 14.69
I 04-18 16:34:13 optimizer.py:909] AWS g5.48xlarge 192 768 A10G:8 us-east-1 16.29
I 04-18 16:34:13 optimizer.py:909] Fluidstack recUYj6oGJCvAvCXC7KQo5Fc7 252 960 A100-80GB:8 generic_1_canada 19.79
I 04-18 16:34:13 optimizer.py:909] GCP a2-ultragpu-4g 48 680 A100-80GB:4 us-central1-a 20.11
I 04-18 16:34:13 optimizer.py:909] Paperspace A100-80Gx8 96 640 A100-80GB:8 East Coast (NY2) 25.44
I 04-18 16:34:13 optimizer.py:909] Azure Standard_ND96asr_v4 96 900 A100:8 eastus 27.20
I 04-18 16:34:13 optimizer.py:909] GCP a2-highgpu-8g 96 680 A100:8 us-central1-a 29.39
I 04-18 16:34:13 optimizer.py:909] Azure Standard_ND96amsr_A100_v4 96 1924 A100-80GB:8 eastus 32.77
I 04-18 16:34:13 optimizer.py:909] AWS p4d.24xlarge 96 1152 A100:8 us-east-1 32.77
I 04-18 16:34:13 optimizer.py:909] GCP a2-ultragpu-8g 96 1360 A100-80GB:8 us-central1-a 40.22
I 04-18 16:34:13 optimizer.py:909] AWS p4de.24xlarge 96 1152 A100-80GB:8 us-east-1 40.97
I 04-18 16:34:13 optimizer.py:909] ------------------------------------------------------------------------------------------------------------------
...
```
Expand Down

0 comments on commit 889adce

Please sign in to comment.