From efa9852e554d1ac75ce9259fc2546ad5bf1c2cf3 Mon Sep 17 00:00:00 2001 From: Michael Green <59619482+mikegre-google@users.noreply.github.com> Date: Fri, 15 Nov 2024 21:26:17 +0000 Subject: [PATCH] [Docs] Misc updates to TPU installation instructions (#10165) --- .../getting_started/tpu-installation.rst | 54 ++++++++++++------- 1 file changed, 35 insertions(+), 19 deletions(-) diff --git a/docs/source/getting_started/tpu-installation.rst b/docs/source/getting_started/tpu-installation.rst index 75ab2b6ba02dc..22cc684a1c778 100644 --- a/docs/source/getting_started/tpu-installation.rst +++ b/docs/source/getting_started/tpu-installation.rst @@ -44,15 +44,18 @@ Requirements Provision Cloud TPUs ==================== -You can provision Cloud TPUs using the `Cloud TPU API `_` -or the `queued resources `_` -API. This section shows how to create TPUs using the queued resource API. -For more information about using the Cloud TPU API, see `Create a Cloud TPU using the Create Node API `_. -`Queued resources `_ -enable you to request Cloud TPU resources in a queued manner. When you request -queued resources, the request is added to a queue maintained by the Cloud TPU -service. When the requested resource becomes available, it's assigned to your -Google Cloud project for your immediate exclusive use. +You can provision Cloud TPUs using the `Cloud TPU API `_ +or the `queued resources `_ +API. This section shows how to create TPUs using the queued resource API. For +more information about using the Cloud TPU API, see `Create a Cloud TPU using the Create Node API `_. +Queued resources enable you to request Cloud TPU resources in a queued manner. +When you request queued resources, the request is added to a queue maintained by +the Cloud TPU service. When the requested resource becomes available, it's +assigned to your Google Cloud project for your immediate exclusive use. + +.. note:: + In all of the following commands, replace the ALL CAPS parameter names with + appropriate values. See the parameter descriptions table for more information. Provision a Cloud TPU with the queued resource API -------------------------------------------------- @@ -68,6 +71,7 @@ Create a TPU v5e with 4 TPU chips: --runtime-version RUNTIME_VERSION \ --service-account SERVICE_ACCOUNT + .. list-table:: Parameter descriptions :header-rows: 1 @@ -81,12 +85,13 @@ Create a TPU v5e with 4 TPU chips: * - PROJECT_ID - Your Google Cloud project * - ZONE - - The `zone `_ where you - want to create your Cloud TPU. + - The GCP zone where you want to create your Cloud TPU. The value you use + depends on the version of TPUs you are using. For more information, see + `TPU regions and zones `_ * - ACCELERATOR_TYPE - - The TPU version you want to use. Specify the TPU version, followed by a - '-' and the number of TPU cores. For example `v5e-4` specifies a v5e TPU - with 4 cores. For more information, see `TPU versions `_. + - The TPU version you want to use. Specify the TPU version, for example + `v5litepod-4` specifies a v5e TPU with 4 cores. For more information, + see `TPU versions `_. * - RUNTIME_VERSION - The TPU VM runtime version to use. For more information see `TPU VM images `_. * - SERVICE_ACCOUNT @@ -98,7 +103,15 @@ Connect to your TPU using SSH: .. code-block:: bash - gcloud compute tpus tpu-vm ssh TPU_NAME + gcloud compute tpus tpu-vm ssh TPU_NAME --zone ZONE + +Install Miniconda + +.. code-block:: bash + + wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh + bash Miniconda3-latest-Linux-x86_64.sh + source ~/.bashrc Create and activate a Conda environment for vLLM: @@ -162,9 +175,11 @@ Run the Docker image with the following command: .. note:: - Since TPU relies on XLA which requires static shapes, vLLM bucketizes the possible input shapes and compiles an XLA graph for each different shape. - The compilation time may take 20~30 minutes in the first run. - However, the compilation time reduces to ~5 minutes afterwards because the XLA graphs are cached in the disk (in :code:`VLLM_XLA_CACHE_PATH` or :code:`~/.cache/vllm/xla_cache` by default). + Since TPU relies on XLA which requires static shapes, vLLM bucketizes the + possible input shapes and compiles an XLA graph for each shape. The + compilation time may take 20~30 minutes in the first run. However, the + compilation time reduces to ~5 minutes afterwards because the XLA graphs are + cached in the disk (in :code:`VLLM_XLA_CACHE_PATH` or :code:`~/.cache/vllm/xla_cache` by default). .. tip:: @@ -173,7 +188,8 @@ Run the Docker image with the following command: .. code-block:: console from torch._C import * # noqa: F403 - ImportError: libopenblas.so.0: cannot open shared object file: No such file or directory + ImportError: libopenblas.so.0: cannot open shared object file: No such + file or directory Install OpenBLAS with the following command: