Update README.md

wandb · Mar 15, 2024 · be0f062 · be0f062
1 parent fe0629c
commit be0f062
Showing 1 changed file with 13 additions and 3 deletions.
diff --git a/jobs/deploy_to_nvidia_nemo_inference_microservice/README.md b/jobs/deploy_to_nvidia_nemo_inference_microservice/README.md
@@ -14,11 +14,16 @@ Deployment time varies by model and machine type. The base Llama2-7b config take
 
 ## User Quickstart
 
-1. Create a queue if you don't have one already, and launch an agent:
+1. Create a queue if you don't have one already.  See an example queue config below.
+   1. You can set `gpus` to the specific GPUs you want to use, or `all` to use everything.
+   2. Set `runtime` to `nvidia`
+
+   ![image](https://github.com/wandb/launch-jobs/assets/15385696/d349e37a-ce1d-48b3-992f-1b4b617efa19)
+2. Launch an agent on your GPU machine:
    ```bash
    wandb launch-agent -e $ENTITY -p $PROJECT -q $QUEUE
    ```
-2. Submit the deployment job with your desired configs from the [Launch UI](https://wandb.ai/launch). See `configs/` for examples.
+3. Submit the deployment job with your desired configs from the [Launch UI](https://wandb.ai/launch). See `configs/` for examples.
    1. You can also submit via the CLI:
       ```bash
       wandb launch -d gcr.io/playground-111/deploy-to-nemo:latest \
@@ -27,7 +32,12 @@ Deployment time varies by model and machine type. The base Llama2-7b config take
         -q $QUEUE \
         -c $CONFIG_JSON_FNAME
       ```
-3. You can track the deployment process in the Launch UI. Once complete, you can immediately curl the endpoint to test the model. The model name is always `ensemble`.
+      ![image](https://github.com/wandb/launch-jobs/assets/15385696/8bc95b7a-94a6-453e-9c87-f6b25a567604)
+
+5. You can track the deployment process in the Launch UI.
+   ![image](https://github.com/wandb/launch-jobs/assets/15385696/49ca8391-689e-4cb7-9ba9-b5691f2cc7aa)
+
+7. Once complete, you can immediately curl the endpoint to test the model. The model name is always `ensemble`.
    ```bash
     #!/bin/bash
     curl -X POST "http://0.0.0.0:9999/v1/completions" \