Skip to content

Commit

Permalink
Merge branch 'master' of github.com:skypilot-org/skypilot into new_pr…
Browse files Browse the repository at this point in the history
…ovisioner_gcp
  • Loading branch information
Michaelvll committed Dec 1, 2023
2 parents 497c438 + a1b0bd3 commit 061e924
Show file tree
Hide file tree
Showing 61 changed files with 1,437 additions and 728 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ SkyPilot **maximizes GPU availability for your jobs**:

SkyPilot **cuts your cloud costs**:
* [Managed Spot](https://skypilot.readthedocs.io/en/latest/examples/spot-jobs.html): 3-6x cost savings using spot VMs, with auto-recovery from preemptions
* Optimizer: 2x cost savings by auto-picking the cheapest VM/zone/region/cloud
* [Optimizer](https://skypilot.readthedocs.io/en/latest/examples/auto-failover.html): 2x cost savings by auto-picking the cheapest VM/zone/region/cloud
* [Autostop](https://skypilot.readthedocs.io/en/latest/reference/auto-stop.html): hands-free cleanup of idle clusters

SkyPilot supports your existing GPU, TPU, and CPU workloads, with no code changes.
Expand Down Expand Up @@ -112,7 +112,7 @@ Prepare the workdir by cloning:
git clone https://github.com/pytorch/examples.git ~/torch_examples
```

Launch with `sky launch` (note: [access to GPU instances](https://skypilot.readthedocs.io/en/latest/reference/quota.html) is needed for this example):
Launch with `sky launch` (note: [access to GPU instances](https://skypilot.readthedocs.io/en/latest/cloud-setup/quota.html) is needed for this example):
```bash
sky launch my_task.yaml
```
Expand Down Expand Up @@ -147,7 +147,7 @@ Runnable examples:
- [LocalGPT](./llm/localgpt)
- [Falcon](./llm/falcon)
- Add yours here & see more in [`llm/`](./llm)!
- Framework examples: [PyTorch DDP](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_torch.yaml), [DeepSpeed](./examples/deepspeed-multinode/sky.yaml), [JAX/Flax on TPU](https://github.com/skypilot-org/skypilot/blob/master/examples/tpu/tpuvm_mnist.yaml), [Stable Diffusion](https://github.com/skypilot-org/skypilot/tree/master/examples/stable_diffusion), [Detectron2](https://github.com/skypilot-org/skypilot/blob/master/examples/detectron2_docker.yaml), [Distributed](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_tf_app.py) [TensorFlow](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_app_storage.yaml), [NeMo](https://github.com/skypilot-org/skypilot/blob/master/examples/nemo/nemo.yaml), [programmatic grid search](https://github.com/skypilot-org/skypilot/blob/master/examples/huggingface_glue_imdb_grid_search_app.py), [Docker](https://github.com/skypilot-org/skypilot/blob/master/examples/docker/echo_app.yaml), and [many more (`examples/`)](./examples).
- Framework examples: [PyTorch DDP](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_torch.yaml), [DeepSpeed](./examples/deepspeed-multinode/sky.yaml), [JAX/Flax on TPU](https://github.com/skypilot-org/skypilot/blob/master/examples/tpu/tpuvm_mnist.yaml), [Stable Diffusion](https://github.com/skypilot-org/skypilot/tree/master/examples/stable_diffusion), [Detectron2](https://github.com/skypilot-org/skypilot/blob/master/examples/detectron2_docker.yaml), [Distributed](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_distributed_tf_app.py) [TensorFlow](https://github.com/skypilot-org/skypilot/blob/master/examples/resnet_app_storage.yaml), [Ray Train](examples/distributed_ray_train/ray_train.yaml), [NeMo](https://github.com/skypilot-org/skypilot/blob/master/examples/nemo/nemo.yaml), [programmatic grid search](https://github.com/skypilot-org/skypilot/blob/master/examples/huggingface_glue_imdb_grid_search_app.py), [Docker](https://github.com/skypilot-org/skypilot/blob/master/examples/docker/echo_app.yaml), and [many more (`examples/`)](./examples).

Follow updates:
- [Twitter](https://twitter.com/skypilot_org)
Expand Down
1 change: 1 addition & 0 deletions docs/requirements-docs.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
sphinx==4.3.2
sphinx-click==5.0.1
sphinx-copybutton==0.5.0
sphinx-design==0.4.1
pydata-sphinx-theme==0.7.2
sphinx-autodoc-typehints==1.17.0
sphinx-book-theme==0.2.0
Expand Down
29 changes: 21 additions & 8 deletions docs/source/cloud-setup/cloud-auth.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,15 +55,28 @@ GCP
GCP Service Account
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`GCP Service Account <https://cloud.google.com/iam/docs/service-account-overview>`__ is supported.
`GCP Service Accounts
<https://cloud.google.com/iam/docs/service-account-overview>`__ are supported.

To use it to access GCP with SkyPilot, you need to setup the credentials:
.. tip::
A service account on your local machine can avoid the periodic
``google.auth.exceptions.RefreshError: Reauthentication is needed. Please
run `gcloud auth application-default login` to reauthenticate.`` error. A
service account is long-lived as it does not have an expiry time.

1. Download the key for the service account from the `GCP console <https://console.cloud.google.com/iam-admin/serviceaccounts>`__.
2. Set the environment variable `GOOGLE_APPLICATION_CREDENTIALS` to the path of the key file, and configure the gcloud CLI tool:
Set up a service account as follows:

.. code-block:: console
1. Follow the :ref:`instructions <gcp-service-account-creation>` to create a service account with the appropriate roles/permissions.
2. In the "Service Accounts" tab in the `IAM & Admin console
<https://console.cloud.google.com/iam-admin/iam>`__, click on the service
account to go to its detailed page. Click on the **KEYS** tab, then click on
**ADD KEY** to add a JSON key. The key will be downloaded automatically.
3. Set the environment variable ``GOOGLE_APPLICATION_CREDENTIALS`` to the path of the key file, and configure the gcloud CLI tool:

.. code-block:: console
$ export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
$ gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS
$ gcloud config set project your-project-id
$ export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
$ gcloud auth activate-service-account --key-file=$GOOGLE_APPLICATION_CREDENTIALS
$ gcloud config set project your-project-id
You may want to add the export statement in your profile (e.g. ``~/.bashrc``, ``~/.zshrc``) so that it is set automatically in all new terminal sessions.
63 changes: 61 additions & 2 deletions docs/source/cloud-setup/cloud-permissions/gcp.rst
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ Service Account

If you already have an service account under "Service Accounts" tab with the email starting with ``skypilot-v1@``, it is likely created by SkyPilot automatically, and you can skip this section.

1. Click the "Service Accounts" tab in the "IAM & Admin" console, and click on the **CREATE SERVICE ACCOUNT**.
1. Click the "Service Accounts" tab in the `IAM & Admin console <https://console.cloud.google.com/iam-admin/iam>`__, and click on **CREATE SERVICE ACCOUNT**.

.. image:: ../../images/screenshots/gcp/create-service-account.png
:width: 80%
Expand All @@ -184,7 +184,9 @@ Service Account
:align: center
:alt: Set Service Account Name

3. Select the ``minimal-skypilot-role`` (or the name you set) created in the last section and click on **DONE**.
3. Select the ``minimal-skypilot-role`` (or the name you set) created in the
last section and click on **DONE**. You can also choose to use the Default or
Medium Permissions roles as described in the previous sections.

.. image:: ../../images/screenshots/gcp/service-account-grant-role.png
:width: 60%
Expand Down Expand Up @@ -265,3 +267,60 @@ See details in :ref:`config-yaml`. Example use cases include using a private VP
VPC with fine-grained constraints, typically created via Terraform or manually.

The custom VPC should contain the :ref:`required firewall rules <gcp-minimum-firewall-rules>`.


.. _gcp-use-internal-ips:


Using Internal IPs
-----------------------
For security reason, users may only want to use internal IPs for SkyPilot instances.
To do so, you can use SkyPilot's global config file ``~/.sky/config.yaml`` to specify the ``gcp.use_internal_ips`` and ``gcp.ssh_proxy_command`` fields (to see the detailed syntax, see :ref:`config-yaml`):

.. code-block:: yaml
gcp:
use_internal_ips: true
# VPC with NAT setup, see below
vpc_name: my-vpc-name
ssh_proxy_command: ssh -W %h:%p -o StrictHostKeyChecking=no [email protected]
The ``gcp.ssh_proxy_command`` field is optional. If SkyPilot is run on a machine that can directly access the internal IPs of the instances, it can be omitted. Otherwise, it should be set to a command that can be used to proxy SSH connections to the internal IPs of the instances.


Cloud NAT Setup
~~~~~~~~~~~~~~~~

Instances created with internal IPs only on GCP cannot access public internet by default. To make sure SkyPilot can install the dependencies correctly on the instances,
cloud NAT needs to be setup for the VPC (see `GCP's documentation <https://cloud.google.com/nat/docs/overview>`__ for details).


Cloud NAT is a regional resource, so it will need to be created in each region that SkyPilot will be used in.


.. image:: ../../images/screenshots/gcp/cloud-nat.png
:width: 80%
:align: center
:alt: GCP Cloud NAT

To limit SkyPilot to use some specific regions only, you can specify the ``gcp.ssh_proxy_command`` to be a dict mapping from region to the SSH proxy command for that region (see :ref:`config-yaml` for details):

.. code-block:: yaml
gcp:
use_internal_ips: true
vpc_name: my-vpc-name
ssh_proxy_command:
us-west1: ssh -W %h:%p -o StrictHostKeyChecking=no [email protected]
us-east1: ssh -W %h:%p -o StrictHostKeyChecking=no [email protected]
If proxy is not needed, but the regions need to be limited, you can set the ``gcp.ssh_proxy_command`` to be a dict mapping from region to ``null``:

.. code-block:: yaml
gcp:
use_internal_ips: true
vpc_name: my-vpc-name
ssh_proxy_command:
us-west1: null
us-east1: null
1 change: 1 addition & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
'sphinx_autodoc_typehints',
'sphinx_click',
'sphinx_copybutton',
'sphinx_design',
]

intersphinx_mapping = {
Expand Down
Loading

0 comments on commit 061e924

Please sign in to comment.