Merge branch 'master' of github.com:skypilot-org/skypilot into suppor…

…t-capacity-block
skypilot-org · Aug 22, 2024 · a261c53 · a261c53
2 parents f693555 + 1cd2444
commit a261c53
Show file tree

Hide file tree

Showing 31 changed files with 538 additions and 334 deletions.
diff --git a/docs/source/cloud-setup/cloud-permissions/aws.rst b/docs/source/cloud-setup/cloud-permissions/aws.rst
@@ -148,7 +148,7 @@ AWS accounts can be attached with a policy that limits the permissions of the ac
     :align: center
     :alt: AWS Add Policy
 
-8. **Optional**: If you would like to have your users access S3 buckets: You can additionally attach S3 access, such as the "AmazonS3FullAccess" policy.
+8. **Optional**: If you would like to have your users access S3 buckets: You can additionally attach S3 access, such as the "AmazonS3FullAccess" policy. Note that enabling S3 access is required to use :ref:`managed-jobs` with `workdir` or `file_mounts` for now.
 
 .. image:: ../../images/screenshots/aws/aws-s3-policy.png
     :width: 80%

diff --git a/docs/source/examples/interactive-development.rst b/docs/source/examples/interactive-development.rst
@@ -110,7 +110,7 @@ This is supported by simply connecting VSCode to the cluster with the cluster na
 
 For more details, please refer to the `VSCode documentation <https://code.visualstudio.com/docs/remote/ssh-tutorial>`__.
 
-.. image:: https://imgur.com/8mKfsET.gif
+.. image:: https://i.imgur.com/8mKfsET.gif
   :align: center
   :alt: Connect to the cluster with VSCode
 

diff --git a/docs/source/getting-started/installation.rst b/docs/source/getting-started/installation.rst
@@ -301,13 +301,12 @@ RunPod
 Fluidstack
 ~~~~~~~~~~~~~~~~~~
 
-`Fluidstack <https://fluidstack.io/>`__ is a cloud provider offering low-cost GPUs. To configure Fluidstack access, go to the `Home <https://console.fluidstack.io/>`__ page on your Fluidstack console to generate an API key and then add the :code:`API key` to :code:`~/.fluidstack/api_key` and the :code:`API token` to :code:`~/.fluidstack/api_token`:
-
+`Fluidstack <https://fluidstack.io/>`__ is a cloud provider offering low-cost GPUs. To configure Fluidstack access, go to the `Home <https://dashboard.fluidstack.io/>`__ page on your Fluidstack console to generate an API key and then add the :code:`API key` to :code:`~/.fluidstack/api_key` :
 .. code-block:: shell
 
   mkdir -p ~/.fluidstack
   echo "your_api_key_here" > ~/.fluidstack/api_key
-  echo "your_api_token_here" > ~/.fluidstack/api_token
+
 
 
 Cudo Compute

diff --git a/llm/codellama/README.md b/llm/codellama/README.md
@@ -10,14 +10,14 @@ The followings are the demos of Code Llama 70B hosted by SkyPilot Serve (aka Sky
 ## Demos
 <figure>
 <center>
-<img src="https://imgur.com/fguAmP0.gif" width="60%" title="Coding Assistant: Connect to hosted Code Llama with Tabby in VScode" />
+<img src="https://i.imgur.com/fguAmP0.gif" width="60%" title="Coding Assistant: Connect to hosted Code Llama with Tabby in VScode" />
 
 <figcaption>Coding Assistant: Connect to hosted Code Llama with Tabby in VScode</figcaption>
 </figure>
 
 <figure>
 <center>
-<img src="https://imgur.com/Dor1MoE.gif" width="60%" title="Chat: Connect to hosted Code Llama with FastChat" />
+<img src="https://i.imgur.com/Dor1MoE.gif" width="60%" title="Chat: Connect to hosted Code Llama with FastChat" />
 
 <figcaption>Chat: Connect to hosted Code Llama with FastChat</figcaption>
 </figure>

diff --git a/llm/falcon/README.md b/llm/falcon/README.md
@@ -50,7 +50,7 @@ sky launch -c falcon -s falcon.yaml --no-use-spot
 
 For reference, below is a loss graph you may expect to see, and the amount of time and the approximate cost of fine-tuning each of the models over 500 epochs (assuming a spot instance A100 GPU rate at $1.1 / hour and a A100-80GB rate of $1.61 / hour):
 
-<img width="524" alt="image" src="https://imgur.com/BDlHink.png">
+<img width="524" alt="image" src="https://i.imgur.com/BDlHink.png">
 
 1. `ybelkada/falcon-7b-sharded-bf16`: 2.5 to 3 hours using 1 A100 spot GPU; total cost ≈ $3.3.
 

diff --git a/llm/gpt-2/README.md b/llm/gpt-2/README.md
@@ -28,22 +28,22 @@ Run the following command to start GPT-2 (124M) training on a GPU VM with 8 A100
 sky launch -c gpt2 gpt2.yaml
 ```
 
-![GPT-2 training with 8 A100 GPUs](https://imgur.com/v8SGpsF.png)
+![GPT-2 training with 8 A100 GPUs](https://i.imgur.com/v8SGpsF.png)
 
 Or, you can train the model with a single A100, by adding `--gpus A100`:
 ```bash
 sky launch -c gpt2 gpt2.yaml --gpus A100
 ```
 
-![GPT-2 training with a single A100](https://imgur.com/hN65g4r.png)
+![GPT-2 training with a single A100](https://i.imgur.com/hN65g4r.png)
 
 
 It is also possible to speed up the training of the model on 8 H100 (2.3x more tok/s than 8x A100s):
 ```bash
 sky launch -c gpt2 gpt2.yaml --gpus H100:8
 ```
 
-![GPT-2 training with 8 H100](https://imgur.com/STbi80b.png)
+![GPT-2 training with 8 H100](https://i.imgur.com/STbi80b.png)
 
 ### Download logs and visualizations
 
@@ -54,7 +54,7 @@ scp -r gpt2:~/llm.c/log124M .
 We can visualize the training progress with the notebook provided in [llm.c](https://github.com/karpathy/llm.c/blob/master/dev/vislog.ipynb). (Note: we cut off the training after 10K steps, which already achieve similar validation loss as OpenAI GPT-2 checkpoint.)
 
 <div align="center">
-<img src="https://imgur.com/lskPEAQ.png" width="60%">
+<img src="https://i.imgur.com/lskPEAQ.png" width="60%">
 </div>
 
 > Yes! We are able to reproduce the training of GPT-2 (124M) on any cloud with SkyPilot.

diff --git a/llm/llama-2/README.md b/llm/llama-2/README.md
@@ -94,6 +94,6 @@ You can also host the official FAIR model without using huggingface and gradio.
     ```
 
 3. Open http://localhost:7681 in your browser and start chatting!
-<img src="https://imgur.com/Ay8sDhG.png" alt="LLaMA chatbot running on the cloud via SkyPilot"/>
+<img src="https://i.imgur.com/Ay8sDhG.png" alt="LLaMA chatbot running on the cloud via SkyPilot"/>
 
 
diff --git a/llm/llama-3/README.md b/llm/llama-3/README.md
@@ -5,7 +5,7 @@
 
 
 <p align="center">
-<img src="https://imgur.com/1NEZs9f.png" alt="Llama-3 x SkyPilot" style="width: 50%;">
+<img src="https://i.imgur.com/1NEZs9f.png" alt="Llama-3 x SkyPilot" style="width: 50%;">
 </p>
 
 [Llama-3](https://github.com/meta-llama/llama3) is the latest top open-source LLM from Meta. It has been released with a license that authorizes commercial use. You can deploy a private Llama-3 chatbot with SkyPilot in your own cloud with just one simple command.
@@ -248,7 +248,7 @@ To use the Gradio UI, open the URL shown in the logs:
 
 
 <p align="center">
-<img src="https://imgur.com/zPpY2Bg.gif" alt="Gradio UI serving Llama-3" style="width: 80%;">
+<img src="https://i.imgur.com/zPpY2Bg.gif" alt="Gradio UI serving Llama-3" style="width: 80%;">
 </p>
 
 To stop the instance:

diff --git a/llm/llama-3_1-finetuning/readme.md b/llm/llama-3_1-finetuning/readme.md
@@ -135,7 +135,7 @@ sky launch -c llama31 lora.yaml \
 
 <figure>
 <center>
-<img src="https://imgur.com/B7Ib4Ii.png" width="60%" />
+<img src="https://i.imgur.com/B7Ib4Ii.png" width="60%" />
 
      
 <figcaption>Training Loss of LoRA finetuning Llama 3.1</figcaption>
@@ -218,10 +218,10 @@ run: |
 
 ## Appendix: Preparation
 1. Request the access to [Llama 3.1 weights on huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (Click on the blue box and follow the steps):
-![](https://imgur.com/snIQhr9.png)
+![](https://i.imgur.com/snIQhr9.png)
 
 2. Get your [huggingface access token](https://huggingface.co/settings/tokens):
-![](https://imgur.com/3idBgHn.png)
+![](https://i.imgur.com/3idBgHn.png)
 
 
 3. Add huggingface token to your environment variable:

diff --git a/llm/lorax/README.md b/llm/lorax/README.md
@@ -4,7 +4,7 @@
 <!-- $UNCOMMENT# LoRAX: Multi-LoRA Inference Server -->
 
 <p align="center">
-    <img src="https://imgur.com/OUapRYC.png" alt="LoRAX" style="width:200px;" />
+    <img src="https://i.imgur.com/OUapRYC.png" alt="LoRAX" style="width:200px;" />
 </p>
 
 [LoRAX](https://github.com/predibase/lorax) (LoRA eXchange) is a framework that allows users to serve thousands of fine-tuned LLMs on a single GPU, dramatically reducing the cost of serving without compromising on throughput or latency. It works by dynamically loading multiple fine-tuned "adapters" (LoRAs, etc.) on top of a single base model at runtime. Concurrent requests for different adapters can be processed together in a single batch, allowing LoRAX to maintain near linear throughput scaling as the number of adapters increases.

diff --git a/llm/vicuna-llama-2/README.md b/llm/vicuna-llama-2/README.md
@@ -1,6 +1,6 @@
 # Train Your Own Vicuna on Llama-2
 
-![Vicuna-Llama-2](https://imgur.com/McZWg6z.gif "Result model in action, trained using this guide. From the SkyPilot and Vicuna teams.")
+![Vicuna-Llama-2](https://i.imgur.com/McZWg6z.gif "Result model in action, trained using this guide. From the SkyPilot and Vicuna teams.")
 
 Meta released [Llama 2](https://ai.meta.com/llama/) two weeks ago and has made a big wave in the AI community. In our opinion, its biggest impact is that the model is now released under a [permissive license](https://github.com/facebookresearch/llama/blob/main/LICENSE) that **allows the model weights to be used commercially**[^1]. This differs from Llama 1 which cannot be used commercially.
 
@@ -106,7 +106,7 @@ sky launch --no-use-spot ...
 
 
 <p align="center">
-    <img src="https://imgur.com/yVIXfQo.gif" width="100%" alt="Optimizer"/>
+    <img src="https://i.imgur.com/yVIXfQo.gif" width="100%" alt="Optimizer"/>
 </p>
 
 **Optional**: Try out the training for the 13B model:
@@ -139,7 +139,7 @@ sky launch -c serve serve.yaml --env MODEL_CKPT=<your-model-checkpoint>/chatbot/
 ```
 In [serve.yaml](https://github.com/skypilot-org/skypilot/tree/master/llm/vicuna-llama-2/serve.yaml), we specified launching a Gradio server that serves the model checkpoint at `<your-model-checkpoint>/chatbot/7b`.
 
-![Vicuna-Llama-2](https://imgur.com/McZWg6z.gif "Serving the resulting model with Gradio.")
+![Vicuna-Llama-2](https://i.imgur.com/McZWg6z.gif "Serving the resulting model with Gradio.")
 
 
 > **Tip**: You can also switch to a cheaper accelerator, such as L4, to save costs, by adding `--gpus L4` to the above command.
diff --git a/llm/vllm/README.md b/llm/vllm/README.md
@@ -4,7 +4,7 @@
 <!-- $UNCOMMENT# vLLM: Easy, Fast, and Cheap LLM Inference -->
 
 <p align="center">
-    <img src="https://imgur.com/yxtzPEu.png" alt="vLLM"/>
+    <img src="https://i.imgur.com/yxtzPEu.png" alt="vLLM"/>
 </p>
 
 This README contains instructions to run a demo for vLLM, an open-source library for fast LLM inference and serving, which improves the throughput compared to HuggingFace by **up to 24x**.

diff --git a/sky/clouds/fluidstack.py b/sky/clouds/fluidstack.py
@@ -15,8 +15,7 @@
 
 _CREDENTIAL_FILES = [
     # credential files for FluidStack,
-    fluidstack_utils.FLUIDSTACK_API_KEY_PATH,
-    fluidstack_utils.FLUIDSTACK_API_TOKEN_PATH,
+    fluidstack_utils.FLUIDSTACK_API_KEY_PATH
 ]
 if typing.TYPE_CHECKING:
     # Renaming to avoid shadowing variables.
@@ -189,20 +188,12 @@ def make_deploy_resources_variables(
             custom_resources = json.dumps(acc_dict, separators=(',', ':'))
         else:
             custom_resources = None
-        cuda_installation_commands = """
-        sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb -O /usr/local/cuda-keyring_1.1-1_all.deb;
-        sudo dpkg -i /usr/local/cuda-keyring_1.1-1_all.deb;
-        sudo apt-get update;
-        sudo apt-get -y install cuda-toolkit-12-3;
-        sudo apt-get install -y cuda-drivers;
-        sudo apt-get install -y python3-pip;
-        nvidia-smi || sudo reboot;"""
+
         return {
             'instance_type': resources.instance_type,
             'custom_resources': custom_resources,
             'region': region.name,
-            'fluidstack_username': self.default_username(region.name),
-            'cuda_installation_commands': cuda_installation_commands,
+            'fluidstack_username': 'ubuntu',
         }
 
     def _get_feasible_launchable_resources(
@@ -270,17 +261,26 @@ def check_credentials(cls) -> Tuple[bool, Optional[str]]:
         try:
             assert os.path.exists(
                 os.path.expanduser(fluidstack_utils.FLUIDSTACK_API_KEY_PATH))
-            assert os.path.exists(
-                os.path.expanduser(fluidstack_utils.FLUIDSTACK_API_TOKEN_PATH))
+
+            with open(os.path.expanduser(
+                    fluidstack_utils.FLUIDSTACK_API_KEY_PATH),
+                      encoding='UTF-8') as f:
+                api_key = f.read().strip()
+                if not api_key.startswith('api_key'):
+                    return False, ('Invalid FluidStack API key format. '
+                                   'To configure credentials, go to:\n    '
+                                   '  https://dashboard.fluidstack.io \n    '
+                                   'to obtain an API key, '
+                                   'then add save the contents '
+                                   'to ~/.fluidstack/api_key \n')
         except AssertionError:
-            return False, (
-                'Failed to access FluidStack Cloud'
-                ' with credentials. '
-                'To configure credentials, go to:\n    '
-                '  https://console.fluidstack.io \n    '
-                'to obtain an API key and API Token, '
-                'then add save the contents '
-                'to ~/.fluidstack/api_key and ~/.fluidstack/api_token \n')
+            return False, ('Failed to access FluidStack Cloud'
+                           ' with credentials. '
+                           'To configure credentials, go to:\n    '
+                           '  https://dashboard.fluidstack.io \n    '
+                           'to obtain an API key, '
+                           'then add save the contents '
+                           'to ~/.fluidstack/api_key \n')
         except requests.exceptions.ConnectionError:
             return False, ('Failed to verify FluidStack Cloud credentials. '
                            'Check your network connection '
@@ -303,21 +303,6 @@ def validate_region_zone(self, region: Optional[str], zone: Optional[str]):
                                                     zone,
                                                     clouds='fluidstack')
 
-    @classmethod
-    def default_username(cls, region: str) -> str:
-        return {
-            'norway_2_eu': 'ubuntu',
-            'calgary_1_canada': 'ubuntu',
-            'norway_3_eu': 'ubuntu',
-            'norway_4_eu': 'ubuntu',
-            'india_2': 'root',
-            'nevada_1_usa': 'fsuser',
-            'generic_1_canada': 'ubuntu',
-            'iceland_1_eu': 'ubuntu',
-            'new_york_1_usa': 'fsuser',
-            'illinois_1_usa': 'fsuser'
-        }.get(region, 'ubuntu')
-
     @classmethod
     def query_status(
         cls,

diff --git a/sky/clouds/service_catalog/data_fetchers/fetch_azure.py b/sky/clouds/service_catalog/data_fetchers/fetch_azure.py
@@ -140,8 +140,12 @@ def get_pricing_df(region: Optional[str] = None) -> 'pd.DataFrame':
     print(f'Done fetching pricing {region}')
     df = pd.DataFrame(all_items)
     assert 'productName' in df.columns, (region, df.columns)
-    return df[(~df['productName'].str.contains(' Windows')) &
-              (df['unitPrice'] > 0)]
+    # Filter out the cloud services and windows products.
+    # Some H100 series use ' Win' instead of ' Windows', e.g.
+    # Virtual Machines NCCadsv5 Srs Win
+    return df[
+        (~df['productName'].str.contains(' Win| Cloud Services| CloudServices'))
+        & (df['unitPrice'] > 0)]
 
 
 def get_sku_df(region_set: Set[str]) -> 'pd.DataFrame':