Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
andrewtruong authored Mar 15, 2024
1 parent fe0629c commit be0f062
Showing 1 changed file with 13 additions and 3 deletions.
16 changes: 13 additions & 3 deletions jobs/deploy_to_nvidia_nemo_inference_microservice/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,16 @@ Deployment time varies by model and machine type. The base Llama2-7b config take

## User Quickstart

1. Create a queue if you don't have one already, and launch an agent:
1. Create a queue if you don't have one already. See an example queue config below.
1. You can set `gpus` to the specific GPUs you want to use, or `all` to use everything.
2. Set `runtime` to `nvidia`

![image](https://github.com/wandb/launch-jobs/assets/15385696/d349e37a-ce1d-48b3-992f-1b4b617efa19)
2. Launch an agent on your GPU machine:
```bash
wandb launch-agent -e $ENTITY -p $PROJECT -q $QUEUE
```
2. Submit the deployment job with your desired configs from the [Launch UI](https://wandb.ai/launch). See `configs/` for examples.
3. Submit the deployment job with your desired configs from the [Launch UI](https://wandb.ai/launch). See `configs/` for examples.
1. You can also submit via the CLI:
```bash
wandb launch -d gcr.io/playground-111/deploy-to-nemo:latest \
Expand All @@ -27,7 +32,12 @@ Deployment time varies by model and machine type. The base Llama2-7b config take
-q $QUEUE \
-c $CONFIG_JSON_FNAME
```
3. You can track the deployment process in the Launch UI. Once complete, you can immediately curl the endpoint to test the model. The model name is always `ensemble`.
![image](https://github.com/wandb/launch-jobs/assets/15385696/8bc95b7a-94a6-453e-9c87-f6b25a567604)

5. You can track the deployment process in the Launch UI.
![image](https://github.com/wandb/launch-jobs/assets/15385696/49ca8391-689e-4cb7-9ba9-b5691f2cc7aa)

7. Once complete, you can immediately curl the endpoint to test the model. The model name is always `ensemble`.
```bash
#!/bin/bash
curl -X POST "http://0.0.0.0:9999/v1/completions" \
Expand Down

0 comments on commit be0f062

Please sign in to comment.