Skip to content

Commit

Permalink
Merge branch 'master' into continuous-sync-alpha-1
Browse files Browse the repository at this point in the history
  • Loading branch information
landscapepainter committed Sep 17, 2023
2 parents 7cbf729 + 3cd931f commit cdd6954
Show file tree
Hide file tree
Showing 53 changed files with 2,328 additions and 550 deletions.
1 change: 1 addition & 0 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ jobs:
- tests/test_storage.py
- tests/test_wheels.py
- tests/test_spot.py
- tests/test_yaml_parser.py
runs-on: ubuntu-latest
steps:
- name: Checkout repository
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,5 @@ build/
sky_logs/
sky/clouds/service_catalog/data_fetchers/*.csv
.vscode/
.idea/

3 changes: 3 additions & 0 deletions Dockerfile_k8s
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ RUN cd /skypilot/ && \
sudo mv -v sky/setup_files/* . && \
pip install ".[aws]"

# Set PYTHONUNBUFFERED=1 to have Python print to stdout/stderr immediately
ENV PYTHONUNBUFFERED=1

# Set WORKDIR and initialize conda for sky user
WORKDIR /home/sky
RUN conda init
3 changes: 3 additions & 0 deletions Dockerfile_k8s_gpu
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@ RUN cd /skypilot/ && \
sudo mv -v sky/setup_files/* . && \
pip install ".[aws]"

# Set PYTHONUNBUFFERED=1 to have Python print to stdout/stderr immediately
ENV PYTHONUNBUFFERED=1

# Set WORKDIR and initialize conda for sky user
WORKDIR /home/sky
RUN conda init
61 changes: 55 additions & 6 deletions docs/source/examples/docker-containers.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,60 @@
.. _docker-containers:

Running Docker Containers
=========================
Using Docker Containers
=======================

SkyPilot can run docker containers as tasks. Docker runtime is configured and ready for use on the default VM image used by SkyPilot.
SkyPilot can run a Docker container either as the runtime environment for your task, or as the task itself.

To run a container, you can directly invoke the :code:`docker run` command in the :code:`run` section of your task.
Using Docker Containers as Runtime Environment
----------------------------------------------

When a container is used as the runtime environment, the SkyPilot task is executed inside the container.

This means all :code:`setup` and :code:`run` commands in the YAML file will be executed in the container, and any files created by the task will be stored inside the container.
Any GPUs assigned to the task will be automatically mapped to your Docker container and all future tasks on the cluster will also execute in the container.

To use a Docker image as your runtime environment, set the :code:`image_id` field in the :code:`resources` section of your task YAML file to :code:`docker:<image_id>`.
For example, to use the :code:`ubuntu:20.04` image from Docker Hub:

.. code-block:: yaml
resources:
image_id: docker:ubuntu:20.04
setup: |
# Will run inside container
run: |
# Will run inside container
For Docker images hosted on private registries, you can provide the registry authentication details using the :code:`envs` section:

.. code-block:: yaml
# private_docker.yaml
resources:
image_id: docker:<your-private-image>:<tag>
envs:
SKYPILOT_DOCKER_USERNAME: <username>
SKYPILOT_DOCKER_PASSWORD: <password>
SKYPILOT_DOCKER_SERVER: <server>
Alternatively, you can set these environment variables directly through the CLI:

.. code-block:: bash
$ sky launch \
--env SKYPILOT_DOCKER_PASSWORD=$(aws ecr get-login-password --region us-east-1) \
--env SKYPILOT_DOCKER_USERNAME=AWS \
private_docker.yaml
Running Docker Containers as Tasks
----------------------------------

As an alternative, SkyPilot can run docker containers as tasks. Docker runtime is configured and ready for use on the default VM image used by SkyPilot.

To run a container as a task, you can directly invoke the :code:`docker run` command in the :code:`run` section of your task.

For example, to run a GPU-accelerated container that prints the output of :code:`nvidia-smi`:

Expand All @@ -18,9 +67,9 @@ For example, to run a GPU-accelerated container that prints the output of :code:
docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi
Building containers remotely
----------------------------
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If required, the container image can also be built remotely on the cluster in the :code:`setup` phase of the task.
If you are running the container as a task, the container image can also be built remotely on the cluster in the :code:`setup` phase of the task.

The :code:`echo_app` `example <https://github.com/skypilot-org/skypilot/tree/master/examples/docker>`_ provides an example on how to do this:

Expand Down
13 changes: 13 additions & 0 deletions docs/source/getting-started/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Install SkyPilot using pip:
$ pip install skypilot
$ # pip install "skypilot[gcp]"
$ # pip install "skypilot[azure]"
$ # pip install "skypilot[kubernetes]"
$ # pip install "skypilot[lambda]"
$ # pip install "skypilot[ibm]"
$ # pip install "skypilot[scp]"
Expand Down Expand Up @@ -240,6 +241,18 @@ To configure SCP access, you need access keys and the ID of the project your tas

Multi-node clusters are currently not supported on SCP.

Kubernetes
~~~~~~~~~~

SkyPilot can also run tasks on on-prem or cloud hosted Kubernetes clusters (e.g., EKS, GKE). The only requirement is a valid kubeconfig at :code:`~/.kube/config`.

.. code-block:: console
$ # Place your kubeconfig at ~/.kube/config
$ mkdir -p ~/.kube
$ cp /path/to/kubeconfig ~/.kube/config
See :ref:`SkyPilot on Kubernetes <kubernetes-overview>` for more.

.. _verify-cloud-access:

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ Documentation
reference/interactive-nodes
reference/faq
reference/logging
reference/local/index
reference/kubernetes/index

.. toctree::
:maxdepth: 1
Expand Down
14 changes: 12 additions & 2 deletions docs/source/reference/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,12 +103,22 @@ By default, SkyPilot supports most global regions on AWS and only supports the U
version=$(python -c 'import sky; print(sky.clouds.service_catalog.constants.CATALOG_SCHEMA_VERSION)')
mkdir -p ~/.sky/catalogs/${version}
cd ~/.sky/catalogs/${version}
# Fetch all regions for GCP
# GCP
pip install lxml
# Fetch U.S. regions for GCP
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp
# Fetch all regions for GCP
python -m sky.clouds.service_catalog.data_fetchers.fetch_gcp --all-regions
# Azure
# Fetch U.S. regions for Azure
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure
# Fetch all regions for Azure
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --all-regions
# Fetch the specified regions for Azure
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --regions japaneast australiaeast uksouth
# Fetch U.S. regions for Azure, excluding the specified regions
python -m sky.clouds.service_catalog.data_fetchers.fetch_azure --exclude centralus eastus
To make your managed spot jobs potentially use all global regions, please log into the spot controller with ``ssh sky-spot-controller-<hash>``
(the full name can be found in ``sky status``), and run the commands above.
Expand Down
161 changes: 161 additions & 0 deletions docs/source/reference/kubernetes/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
.. _kubernetes-overview:

Running on Kubernetes (Alpha)
=============================

.. note::
Kubernetes support is in alpha preview and under active development.
There may be rough edges and features may change without notice.
Please report any `bugs <https://github.com/skypilot-org/skypilot/issues>`_ and
`reach out to us <http://slack.skypilot.co>`_ for feature requests.

SkyPilot tasks can be run on your private on-prem or cloud Kubernetes clusters.
The Kubernetes cluster gets added to the list of "clouds" in SkyPilot and SkyPilot
tasks can be submitted to your Kubernetes cluster just like any other cloud provider.

**Benefits of using SkyPilot to run jobs on your Kubernetes cluster:**

* Get SkyPilot features (setup management, job execution, queuing, logging, SSH access) on your Kubernetes resources
* Replace complex Kubernetes manifests with simple SkyPilot tasks
* Seamlessly "burst" jobs to the cloud if your Kubernetes cluster is congested
* Retain observability and control over your cluster with your existing Kubernetes tools

**Supported Kubernetes deployments:**

* Hosted Kubernetes services (EKS, GKE)
* On-prem clusters (Kubeadm, K3s, Rancher)
* Local development clusters (KinD, minikube)


Kubernetes Cluster Requirements
-------------------------------

To connect and use a Kubernetes cluster, SkyPilot needs:

* An existing Kubernetes cluster running Kubernetes v1.20 or later.
* A `Kubeconfig <https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/>`_ file containing access credentials and namespace to be used.

In a typical workflow:

1. A cluster administrator sets up a Kubernetes cluster. Detailed admin guides for
different deployment environments (Amazon EKS, Google GKE, On-Prem and local debugging) are included in the :ref:`Kubernetes cluster setup guide <kubernetes-setup>`.

2. Users who want to run SkyPilot tasks on this cluster are issued Kubeconfig
files containing their credentials (`kube-context <https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/#define-clusters-users-and-contexts>`_).
SkyPilot reads this Kubeconfig file to communicate with the cluster.

Submitting SkyPilot tasks to Kubernetes Clusters
------------------------------------------------
.. _kubernetes-instructions:

Once your cluster administrator has :ref:`setup a Kubernetes cluster <kubernetes-setup>` and provided you with a kubeconfig file:

0. Make sure `kubectl <https://kubernetes.io/docs/tasks/tools/>`_, ``socat`` and ``lsof`` are installed on your local machine.

.. code-block:: console
$ # MacOS (lsof is already installed)
$ brew install kubectl socat
$ # Linux (may have socat and lsof already installed)
$ sudo apt-get install kubectl socat lsof
1. Place your kubeconfig file at ``~/.kube/config``.

.. code-block:: console
$ mkdir -p ~/.kube
$ cp /path/to/kubeconfig ~/.kube/config
You can verify your credentials are setup correctly by running :code:`kubectl get pods`.

2. Run :code:`sky check` and verify that Kubernetes is enabled in SkyPilot.

.. code-block:: console
$ sky check
Checking credentials to enable clouds for SkyPilot.
...
Kubernetes: enabled
...
.. note::
:code:`sky check` will also check if GPU support is available on your cluster. If GPU support is not available, it
will show the reason.
To setup GPU support on the cluster, refer to the :ref:`Kubernetes cluster setup guide <kubernetes-setup>`.

4. You can now run any SkyPilot task on your Kubernetes cluster.

.. code-block:: console
$ sky launch --cpus 2+ task.yaml
== Optimizer ==
Target: minimizing cost
Estimated cost: $0.0 / hour
Considered resources (1 node):
---------------------------------------------------------------------------------------------------
CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
---------------------------------------------------------------------------------------------------
Kubernetes 2CPU--2GB 2 2 - kubernetes 0.00 ✔
AWS m6i.large 2 8 - us-east-1 0.10
Azure Standard_D2s_v5 2 8 - eastus 0.10
GCP n2-standard-2 2 8 - us-central1 0.10
IBM bx2-8x32 8 32 - us-east 0.38
Lambda gpu_1x_a10 30 200 A10:1 us-east-1 0.60
---------------------------------------------------------------------------------------------------.
.. note::
SkyPilot will use the cluster and namespace set in the ``current-context`` in the
kubeconfig file. To manage your ``current-context``:

.. code-block:: console
$ # See current context
$ kubectl config current-context
$ # Switch current-context
$ kubectl config use-context mycontext
$ # Set a specific namespace to be used in the current-context
$ kubectl config set-context --current --namespace=mynamespace
FAQs
----

* **Are autoscaling Kubernetes clusters supported?**

To run on an autoscaling cluster, you may need to adjust the resource provisioning timeout (:code:`Kubernetes.TIMEOUT` in `clouds/kubernetes.py`) to a large value to give enough time for the cluster to autoscale. We are working on a better interface to adjust this timeout - stay tuned!

* **What container image is used for tasks? Can I specify my own image?**

We use and maintain a SkyPilot container image that has conda and a few other basic tools installed. You can specify a custom image to use in `clouds/kubernetes.py`, but it must have rsync, conda and OpenSSH server installed. We are working on a interface to allow specifying custom images through the :code:`image_id` field in the task YAML - stay tuned!

* **Can SkyPilot provision a Kubernetes cluster for me? Will SkyPilot add more nodes to my Kubernetes clusters?**

The goal of Kubernetes support is to run SkyPilot tasks on an existing Kubernetes cluster. It does not provision any new Kubernetes clusters or add new nodes to an existing Kubernetes cluster.

* **I have multiple users in my organization who share the same Kubernetes cluster. How do I provide isolation for their SkyPilot workloads?**

For isolation, you can create separate Kubernetes namespaces and set them in the kubeconfig distributed to users. SkyPilot will use the namespace set in the kubeconfig for running all tasks.

Features and Roadmap
--------------------

Kubernetes support is under active development. Some features are in progress and will be released soon:

* CPU and GPU Tasks - ✅ Available
* Auto-down - ✅ Available
* Storage mounting - ✅ Available on x86_64 clusters
* Multi-node tasks - 🚧 In progress
* Multiple Kubernetes Clusters - 🚧 In progress


.. toctree::
:hidden:

kubernetes-setup
Loading

0 comments on commit cdd6954

Please sign in to comment.