Skip to content

Commit

Permalink
Merge branch 'master' of github.com:skypilot-org/skypilot into suppor…
Browse files Browse the repository at this point in the history
…t-capacity-block
  • Loading branch information
Michaelvll committed Aug 22, 2024
2 parents f693555 + 1cd2444 commit a261c53
Show file tree
Hide file tree
Showing 31 changed files with 538 additions and 334 deletions.
2 changes: 1 addition & 1 deletion docs/source/cloud-setup/cloud-permissions/aws.rst
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ AWS accounts can be attached with a policy that limits the permissions of the ac
:align: center
:alt: AWS Add Policy

8. **Optional**: If you would like to have your users access S3 buckets: You can additionally attach S3 access, such as the "AmazonS3FullAccess" policy.
8. **Optional**: If you would like to have your users access S3 buckets: You can additionally attach S3 access, such as the "AmazonS3FullAccess" policy. Note that enabling S3 access is required to use :ref:`managed-jobs` with `workdir` or `file_mounts` for now.

.. image:: ../../images/screenshots/aws/aws-s3-policy.png
:width: 80%
Expand Down
2 changes: 1 addition & 1 deletion docs/source/examples/interactive-development.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ This is supported by simply connecting VSCode to the cluster with the cluster na

For more details, please refer to the `VSCode documentation <https://code.visualstudio.com/docs/remote/ssh-tutorial>`__.

.. image:: https://imgur.com/8mKfsET.gif
.. image:: https://i.imgur.com/8mKfsET.gif
:align: center
:alt: Connect to the cluster with VSCode

Expand Down
5 changes: 2 additions & 3 deletions docs/source/getting-started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -301,13 +301,12 @@ RunPod
Fluidstack
~~~~~~~~~~~~~~~~~~

`Fluidstack <https://fluidstack.io/>`__ is a cloud provider offering low-cost GPUs. To configure Fluidstack access, go to the `Home <https://console.fluidstack.io/>`__ page on your Fluidstack console to generate an API key and then add the :code:`API key` to :code:`~/.fluidstack/api_key` and the :code:`API token` to :code:`~/.fluidstack/api_token`:

`Fluidstack <https://fluidstack.io/>`__ is a cloud provider offering low-cost GPUs. To configure Fluidstack access, go to the `Home <https://dashboard.fluidstack.io/>`__ page on your Fluidstack console to generate an API key and then add the :code:`API key` to :code:`~/.fluidstack/api_key` :
.. code-block:: shell
mkdir -p ~/.fluidstack
echo "your_api_key_here" > ~/.fluidstack/api_key
echo "your_api_token_here" > ~/.fluidstack/api_token
Cudo Compute
Expand Down
4 changes: 2 additions & 2 deletions llm/codellama/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ The followings are the demos of Code Llama 70B hosted by SkyPilot Serve (aka Sky
## Demos
<figure>
<center>
<img src="https://imgur.com/fguAmP0.gif" width="60%" title="Coding Assistant: Connect to hosted Code Llama with Tabby in VScode" />
<img src="https://i.imgur.com/fguAmP0.gif" width="60%" title="Coding Assistant: Connect to hosted Code Llama with Tabby in VScode" />

<figcaption>Coding Assistant: Connect to hosted Code Llama with Tabby in VScode</figcaption>
</figure>

<figure>
<center>
<img src="https://imgur.com/Dor1MoE.gif" width="60%" title="Chat: Connect to hosted Code Llama with FastChat" />
<img src="https://i.imgur.com/Dor1MoE.gif" width="60%" title="Chat: Connect to hosted Code Llama with FastChat" />

<figcaption>Chat: Connect to hosted Code Llama with FastChat</figcaption>
</figure>
Expand Down
2 changes: 1 addition & 1 deletion llm/falcon/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ sky launch -c falcon -s falcon.yaml --no-use-spot

For reference, below is a loss graph you may expect to see, and the amount of time and the approximate cost of fine-tuning each of the models over 500 epochs (assuming a spot instance A100 GPU rate at $1.1 / hour and a A100-80GB rate of $1.61 / hour):

<img width="524" alt="image" src="https://imgur.com/BDlHink.png">
<img width="524" alt="image" src="https://i.imgur.com/BDlHink.png">

1. `ybelkada/falcon-7b-sharded-bf16`: 2.5 to 3 hours using 1 A100 spot GPU; total cost ≈ $3.3.

Expand Down
8 changes: 4 additions & 4 deletions llm/gpt-2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,22 +28,22 @@ Run the following command to start GPT-2 (124M) training on a GPU VM with 8 A100
sky launch -c gpt2 gpt2.yaml
```

![GPT-2 training with 8 A100 GPUs](https://imgur.com/v8SGpsF.png)
![GPT-2 training with 8 A100 GPUs](https://i.imgur.com/v8SGpsF.png)

Or, you can train the model with a single A100, by adding `--gpus A100`:
```bash
sky launch -c gpt2 gpt2.yaml --gpus A100
```

![GPT-2 training with a single A100](https://imgur.com/hN65g4r.png)
![GPT-2 training with a single A100](https://i.imgur.com/hN65g4r.png)


It is also possible to speed up the training of the model on 8 H100 (2.3x more tok/s than 8x A100s):
```bash
sky launch -c gpt2 gpt2.yaml --gpus H100:8
```

![GPT-2 training with 8 H100](https://imgur.com/STbi80b.png)
![GPT-2 training with 8 H100](https://i.imgur.com/STbi80b.png)

### Download logs and visualizations

Expand All @@ -54,7 +54,7 @@ scp -r gpt2:~/llm.c/log124M .
We can visualize the training progress with the notebook provided in [llm.c](https://github.com/karpathy/llm.c/blob/master/dev/vislog.ipynb). (Note: we cut off the training after 10K steps, which already achieve similar validation loss as OpenAI GPT-2 checkpoint.)

<div align="center">
<img src="https://imgur.com/lskPEAQ.png" width="60%">
<img src="https://i.imgur.com/lskPEAQ.png" width="60%">
</div>

> Yes! We are able to reproduce the training of GPT-2 (124M) on any cloud with SkyPilot.
Expand Down
2 changes: 1 addition & 1 deletion llm/llama-2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,6 @@ You can also host the official FAIR model without using huggingface and gradio.
```

3. Open http://localhost:7681 in your browser and start chatting!
<img src="https://imgur.com/Ay8sDhG.png" alt="LLaMA chatbot running on the cloud via SkyPilot"/>
<img src="https://i.imgur.com/Ay8sDhG.png" alt="LLaMA chatbot running on the cloud via SkyPilot"/>


4 changes: 2 additions & 2 deletions llm/llama-3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@


<p align="center">
<img src="https://imgur.com/1NEZs9f.png" alt="Llama-3 x SkyPilot" style="width: 50%;">
<img src="https://i.imgur.com/1NEZs9f.png" alt="Llama-3 x SkyPilot" style="width: 50%;">
</p>

[Llama-3](https://github.com/meta-llama/llama3) is the latest top open-source LLM from Meta. It has been released with a license that authorizes commercial use. You can deploy a private Llama-3 chatbot with SkyPilot in your own cloud with just one simple command.
Expand Down Expand Up @@ -248,7 +248,7 @@ To use the Gradio UI, open the URL shown in the logs:


<p align="center">
<img src="https://imgur.com/zPpY2Bg.gif" alt="Gradio UI serving Llama-3" style="width: 80%;">
<img src="https://i.imgur.com/zPpY2Bg.gif" alt="Gradio UI serving Llama-3" style="width: 80%;">
</p>

To stop the instance:
Expand Down
6 changes: 3 additions & 3 deletions llm/llama-3_1-finetuning/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ sky launch -c llama31 lora.yaml \

<figure>
<center>
<img src="https://imgur.com/B7Ib4Ii.png" width="60%" />
<img src="https://i.imgur.com/B7Ib4Ii.png" width="60%" />

<figcaption>Training Loss of LoRA finetuning Llama 3.1</figcaption>
Expand Down Expand Up @@ -218,10 +218,10 @@ run: |
## Appendix: Preparation
1. Request the access to [Llama 3.1 weights on huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (Click on the blue box and follow the steps):
![](https://imgur.com/snIQhr9.png)
![](https://i.imgur.com/snIQhr9.png)
2. Get your [huggingface access token](https://huggingface.co/settings/tokens):
![](https://imgur.com/3idBgHn.png)
![](https://i.imgur.com/3idBgHn.png)
3. Add huggingface token to your environment variable:
Expand Down
2 changes: 1 addition & 1 deletion llm/lorax/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<!-- $UNCOMMENT# LoRAX: Multi-LoRA Inference Server -->

<p align="center">
<img src="https://imgur.com/OUapRYC.png" alt="LoRAX" style="width:200px;" />
<img src="https://i.imgur.com/OUapRYC.png" alt="LoRAX" style="width:200px;" />
</p>

[LoRAX](https://github.com/predibase/lorax) (LoRA eXchange) is a framework that allows users to serve thousands of fine-tuned LLMs on a single GPU, dramatically reducing the cost of serving without compromising on throughput or latency. It works by dynamically loading multiple fine-tuned "adapters" (LoRAs, etc.) on top of a single base model at runtime. Concurrent requests for different adapters can be processed together in a single batch, allowing LoRAX to maintain near linear throughput scaling as the number of adapters increases.
Expand Down
6 changes: 3 additions & 3 deletions llm/vicuna-llama-2/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Train Your Own Vicuna on Llama-2

![Vicuna-Llama-2](https://imgur.com/McZWg6z.gif "Result model in action, trained using this guide. From the SkyPilot and Vicuna teams.")
![Vicuna-Llama-2](https://i.imgur.com/McZWg6z.gif "Result model in action, trained using this guide. From the SkyPilot and Vicuna teams.")

Meta released [Llama 2](https://ai.meta.com/llama/) two weeks ago and has made a big wave in the AI community. In our opinion, its biggest impact is that the model is now released under a [permissive license](https://github.com/facebookresearch/llama/blob/main/LICENSE) that **allows the model weights to be used commercially**[^1]. This differs from Llama 1 which cannot be used commercially.

Expand Down Expand Up @@ -106,7 +106,7 @@ sky launch --no-use-spot ...


<p align="center">
<img src="https://imgur.com/yVIXfQo.gif" width="100%" alt="Optimizer"/>
<img src="https://i.imgur.com/yVIXfQo.gif" width="100%" alt="Optimizer"/>
</p>

**Optional**: Try out the training for the 13B model:
Expand Down Expand Up @@ -139,7 +139,7 @@ sky launch -c serve serve.yaml --env MODEL_CKPT=<your-model-checkpoint>/chatbot/
```
In [serve.yaml](https://github.com/skypilot-org/skypilot/tree/master/llm/vicuna-llama-2/serve.yaml), we specified launching a Gradio server that serves the model checkpoint at `<your-model-checkpoint>/chatbot/7b`.

![Vicuna-Llama-2](https://imgur.com/McZWg6z.gif "Serving the resulting model with Gradio.")
![Vicuna-Llama-2](https://i.imgur.com/McZWg6z.gif "Serving the resulting model with Gradio.")


> **Tip**: You can also switch to a cheaper accelerator, such as L4, to save costs, by adding `--gpus L4` to the above command.
2 changes: 1 addition & 1 deletion llm/vllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<!-- $UNCOMMENT# vLLM: Easy, Fast, and Cheap LLM Inference -->

<p align="center">
<img src="https://imgur.com/yxtzPEu.png" alt="vLLM"/>
<img src="https://i.imgur.com/yxtzPEu.png" alt="vLLM"/>
</p>

This README contains instructions to run a demo for vLLM, an open-source library for fast LLM inference and serving, which improves the throughput compared to HuggingFace by **up to 24x**.
Expand Down
59 changes: 22 additions & 37 deletions sky/clouds/fluidstack.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,7 @@

_CREDENTIAL_FILES = [
# credential files for FluidStack,
fluidstack_utils.FLUIDSTACK_API_KEY_PATH,
fluidstack_utils.FLUIDSTACK_API_TOKEN_PATH,
fluidstack_utils.FLUIDSTACK_API_KEY_PATH
]
if typing.TYPE_CHECKING:
# Renaming to avoid shadowing variables.
Expand Down Expand Up @@ -189,20 +188,12 @@ def make_deploy_resources_variables(
custom_resources = json.dumps(acc_dict, separators=(',', ':'))
else:
custom_resources = None
cuda_installation_commands = """
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb -O /usr/local/cuda-keyring_1.1-1_all.deb;
sudo dpkg -i /usr/local/cuda-keyring_1.1-1_all.deb;
sudo apt-get update;
sudo apt-get -y install cuda-toolkit-12-3;
sudo apt-get install -y cuda-drivers;
sudo apt-get install -y python3-pip;
nvidia-smi || sudo reboot;"""

return {
'instance_type': resources.instance_type,
'custom_resources': custom_resources,
'region': region.name,
'fluidstack_username': self.default_username(region.name),
'cuda_installation_commands': cuda_installation_commands,
'fluidstack_username': 'ubuntu',
}

def _get_feasible_launchable_resources(
Expand Down Expand Up @@ -270,17 +261,26 @@ def check_credentials(cls) -> Tuple[bool, Optional[str]]:
try:
assert os.path.exists(
os.path.expanduser(fluidstack_utils.FLUIDSTACK_API_KEY_PATH))
assert os.path.exists(
os.path.expanduser(fluidstack_utils.FLUIDSTACK_API_TOKEN_PATH))

with open(os.path.expanduser(
fluidstack_utils.FLUIDSTACK_API_KEY_PATH),
encoding='UTF-8') as f:
api_key = f.read().strip()
if not api_key.startswith('api_key'):
return False, ('Invalid FluidStack API key format. '
'To configure credentials, go to:\n '
' https://dashboard.fluidstack.io \n '
'to obtain an API key, '
'then add save the contents '
'to ~/.fluidstack/api_key \n')
except AssertionError:
return False, (
'Failed to access FluidStack Cloud'
' with credentials. '
'To configure credentials, go to:\n '
' https://console.fluidstack.io \n '
'to obtain an API key and API Token, '
'then add save the contents '
'to ~/.fluidstack/api_key and ~/.fluidstack/api_token \n')
return False, ('Failed to access FluidStack Cloud'
' with credentials. '
'To configure credentials, go to:\n '
' https://dashboard.fluidstack.io \n '
'to obtain an API key, '
'then add save the contents '
'to ~/.fluidstack/api_key \n')
except requests.exceptions.ConnectionError:
return False, ('Failed to verify FluidStack Cloud credentials. '
'Check your network connection '
Expand All @@ -303,21 +303,6 @@ def validate_region_zone(self, region: Optional[str], zone: Optional[str]):
zone,
clouds='fluidstack')

@classmethod
def default_username(cls, region: str) -> str:
return {
'norway_2_eu': 'ubuntu',
'calgary_1_canada': 'ubuntu',
'norway_3_eu': 'ubuntu',
'norway_4_eu': 'ubuntu',
'india_2': 'root',
'nevada_1_usa': 'fsuser',
'generic_1_canada': 'ubuntu',
'iceland_1_eu': 'ubuntu',
'new_york_1_usa': 'fsuser',
'illinois_1_usa': 'fsuser'
}.get(region, 'ubuntu')

@classmethod
def query_status(
cls,
Expand Down
8 changes: 6 additions & 2 deletions sky/clouds/service_catalog/data_fetchers/fetch_azure.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,8 +140,12 @@ def get_pricing_df(region: Optional[str] = None) -> 'pd.DataFrame':
print(f'Done fetching pricing {region}')
df = pd.DataFrame(all_items)
assert 'productName' in df.columns, (region, df.columns)
return df[(~df['productName'].str.contains(' Windows')) &
(df['unitPrice'] > 0)]
# Filter out the cloud services and windows products.
# Some H100 series use ' Win' instead of ' Windows', e.g.
# Virtual Machines NCCadsv5 Srs Win
return df[
(~df['productName'].str.contains(' Win| Cloud Services| CloudServices'))
& (df['unitPrice'] > 0)]


def get_sku_df(region_set: Set[str]) -> 'pd.DataFrame':
Expand Down
Loading

0 comments on commit a261c53

Please sign in to comment.