Skip to content

Commit

Permalink
Docs: various improvements. (#2827)
Browse files Browse the repository at this point in the history
  • Loading branch information
concretevitamin authored Nov 30, 2023
1 parent a7f185b commit 6612301
Show file tree
Hide file tree
Showing 5 changed files with 77 additions and 16 deletions.
19 changes: 10 additions & 9 deletions docs/source/examples/auto-failover.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,11 +101,11 @@ A10, L4, and A10g GPUs, using :code:`sky launch task.yaml`.
$ sky launch task.yaml
...
I 11-19 08:07:45 optimizer.py:910] -----------------------------------------------------------------------------------------------------
I 11-19 08:07:45 optimizer.py:910] CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
I 11-19 08:07:45 optimizer.py:910] CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
I 11-19 08:07:45 optimizer.py:910] -----------------------------------------------------------------------------------------------------
I 11-19 08:07:45 optimizer.py:910] Azure Standard_NV6ads_A10_v5 6 55 A10:1 eastus 0.45 ✔
I 11-19 08:07:45 optimizer.py:910] GCP g2-standard-4 4 16 L4:1 us-east4-a 0.70
I 11-19 08:07:45 optimizer.py:910] AWS g5.xlarge 4 16 A10G:1 us-east-1 1.01
I 11-19 08:07:45 optimizer.py:910] Azure Standard_NV6ads_A10_v5 6 55 A10:1 eastus 0.45 ✔
I 11-19 08:07:45 optimizer.py:910] GCP g2-standard-4 4 16 L4:1 us-east4-a 0.70
I 11-19 08:07:45 optimizer.py:910] AWS g5.xlarge 4 16 A10G:1 us-east-1 1.01
I 11-19 08:07:45 optimizer.py:910] -----------------------------------------------------------------------------------------------------
Expand All @@ -119,6 +119,7 @@ To specify a preference order, use a list of candidate GPUs in the task yaml:
In the above example, SkyPilot will first try to provision an A10 GPU, then an A10g GPU, and finally an L4 GPU.

.. _multiple-resources:

(**Advanced**) Multiple Candidate Resources
--------------------------------------------
Expand Down Expand Up @@ -149,7 +150,7 @@ If a task would like to specify multiple candidate resources (not only GPUs), th
- cloud: azure
region: eastus
accelerator: A100
.. tip::

The list items are specified with a leading prefix :code:`-`, and each item is a dictionary that
Expand All @@ -175,10 +176,10 @@ This will genereate the following output:
$ sky launch -c mycluster task.yaml
...
I 11-20 14:06:24 optimizer.py:910] ----------------------------------------------------------------------------------------------
I 11-20 14:06:24 optimizer.py:910] CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
I 11-20 14:06:24 optimizer.py:910] CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
I 11-20 14:06:24 optimizer.py:910] ----------------------------------------------------------------------------------------------
I 11-20 14:06:24 optimizer.py:910] GCP a2-highgpu-8g 96 680 A100:8 us-central1-a 29.39 ✔
I 11-20 14:06:24 optimizer.py:910] AWS p4d.24xlarge 96 1152 A100:8 us-east-2 32.77
I 11-20 14:06:24 optimizer.py:910] GCP a2-highgpu-8g 96 680 A100:8 us-central1-a 29.39 ✔
I 11-20 14:06:24 optimizer.py:910] AWS p4d.24xlarge 96 1152 A100:8 us-east-2 32.77
I 11-20 14:06:24 optimizer.py:910] ----------------------------------------------------------------------------------------------
I 11-20 14:06:24 optimizer.py:910]
Launching a new cluster 'mycluster'. Proceed? [Y/n]:
Launching a new cluster 'mycluster'. Proceed? [Y/n]:
23 changes: 22 additions & 1 deletion docs/source/examples/docker-containers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Using Docker Containers
SkyPilot can run a container either as a task, or as the runtime environment of a cluster.

* If the container image is invocable / has an entrypoint: run it :ref:`as a task <docker-containers-as-tasks>`.
* Otherwise, the container image is likely to be used as a runtime environment (e.g., ``ubuntu``) and you likely have extra commands to run inside the container: run it :ref:`as a runtime environment <docker-containers-as-runtime-environments>`.
* If the container image is to be used as a runtime environment (e.g., ``ubuntu``, ``nvcr.io/nvidia/pytorch:23.10-py3``, etc.) and if you have extra commands to run inside the container: run it :ref:`as a runtime environment <docker-containers-as-runtime-environments>`.

.. _docker-containers-as-tasks:

Expand Down Expand Up @@ -99,6 +99,7 @@ When a container is used as the runtime environment, everything happens inside t
- Any files created by the task will be stored inside the container.

To use a Docker image as your runtime environment, set the :code:`image_id` field in the :code:`resources` section of your task YAML file to :code:`docker:<image_id>`.

For example, to use the :code:`ubuntu:20.04` image from Docker Hub:

.. code-block:: yaml
Expand All @@ -112,6 +113,26 @@ For example, to use the :code:`ubuntu:20.04` image from Docker Hub:
run: |
# Commands to run inside the container
As another example, here's how to use `NVIDIA's PyTorch NGC Container <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch>`_:

.. code-block:: yaml
resources:
image_id: docker:nvcr.io/nvidia/pytorch:23.10-py3
accelerators: T4
setup: |
# Commands to run inside the container
run: |
# Commands to run inside the container
# Since SkyPilot tasks are run inside a fresh conda "(base)" environment,
# deactivate first to access what the Docker image has already installed.
source deactivate
nvidia-smi
python -c 'import torch; print(torch.__version__)'
Any GPUs assigned to the task will be automatically mapped to your Docker container, and all subsequent tasks within the cluster will also run inside the container. In a multi-node scenario, the container will be launched on all nodes, and the corresponding node's container will be assigned for task execution.

.. tip::
Expand Down
39 changes: 37 additions & 2 deletions docs/source/reference/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ How can I launch a VS Code tunnel using a SkyPilot task definition?
To launch a VS Code tunnel using a SkyPilot task definition, you can use the following task definition:

.. code-block:: yaml
setup: |
sudo snap install --classic code
# if `snap` is not available, you can try the following commands instead:
Expand All @@ -93,6 +93,41 @@ To launch a VS Code tunnel using a SkyPilot task definition, you can use the fol
Note that you'll be prompted to authenticate with your GitHub account to launch a VS Code tunnel.

How to launch VMs in a subset of regions only (e.g., Europe only)?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When defining a task, you can use the ``resources.any_of`` field to specify a set of regions you want to launch VMs in.

For example, to launch VMs in Europe only (which can help with GDPR compliance), you can use the following task definition:

.. code-block:: yaml
resources:
# SkyPilot will perform cost optimization among the specified regions.
any_of:
# AWS:
- region: eu-central-1
- region: eu-west-1
- region: eu-west-2
- region: eu-west-3
- region: eu-north-1
# GCP:
- region: europe-central2
- region: europe-north1
- region: europe-southwest1
- region: europe-west1
- region: europe-west10
- region: europe-west12
- region: europe-west2
- region: europe-west3
- region: europe-west4
- region: europe-west6
- region: europe-west8
- region: europe-west9
# Or put in other clouds' Europe regions.
See more details about the ``resources.any_of`` field :ref:`here <multiple-resources>`.

(Advanced) How to make SkyPilot use all global regions?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -128,7 +163,7 @@ By default, SkyPilot supports most global regions on AWS and only supports the U
# Fetch U.S. regions for Azure, excluding the specified regions
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --exclude centralus eastus
To make your managed spot jobs potentially use all global regions, please log into the spot controller with ``ssh sky-spot-controller-<hash>``
To make your managed spot jobs potentially use all global regions, please log into the spot controller with ``ssh sky-spot-controller-<hash>``
(the full name can be found in ``sky status``), and run the commands above.


Expand Down
8 changes: 6 additions & 2 deletions docs/source/reference/job-queue.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,17 @@ for execution on an existing cluster:
The :code:`-d / --detach` flag detaches logging from the terminal, which is useful for launching many long-running jobs concurrently.

To view the output for each job:
To show a cluster's jobs and their statuses:

.. code-block:: bash
# Show a cluster's jobs (IDs, statuses).
# Show a cluster's jobs (job IDs, statuses).
sky queue mycluster
To show the output for each job:

.. code-block:: bash
# Stream the outputs of a job.
sky logs mycluster JOB_ID
Expand Down
4 changes: 2 additions & 2 deletions sky/optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -1220,8 +1220,8 @@ def _fill_in_launchable_resources(
num_node_str = f'{task.num_nodes}x '
if not quiet:
logger.info(
f'No resource satisfying {num_node_str}{resources} '
f'on {clouds_str}.')
f'No resource satisfying {num_node_str}'
f'{resources.repr_with_region_zone} on {clouds_str}.')
if len(all_fuzzy_candidates) > 0:
logger.info('Did you mean: '
f'{colorama.Fore.CYAN}'
Expand Down

0 comments on commit 6612301

Please sign in to comment.