Skip to content

Commit

Permalink
Merge pull request #478 from Gaurav141199/nim_model_deployment_update
Browse files Browse the repository at this point in the history
README for NIM Model Deployment
  • Loading branch information
liudmylaru authored Aug 28, 2024
2 parents 34f5813 + b3243fb commit b4937aa
Show file tree
Hide file tree
Showing 3 changed files with 163 additions and 4 deletions.
75 changes: 75 additions & 0 deletions model-deployment/containers/nim/README-MODEL-CATALOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Overview

Utilising Model Catalog to store Models in OCI. We describe two ways to achieve this:

* Storing zipped model file in Model Catalog
* Utilising Object storage to store the model and creating a model catalog pointing to Object storage bucket [Refer](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/nim/README-MODEL-CATALOG.md)

# Pre-requisite

The following are the pre-requisite:
* Notebook session with internet access (Recommended)
* Download the Llama 3 8B Instruct Model from [HuggingFace](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or NGC repository.

## Download NIM Container image and upload to OCIR
* Pull the latest NIM Image to local machine. Tag it with desired name.
```bash
docker pull nvcr.io/nim/meta/llama3-8b-instruct:latest
docker tag nvcr.io/nim/meta/llama3-8b-instruct:latest odsc-nim-llama3:latest
```
## OCI Container Registry

Once NIM container is pushed, you can now use the `Bring Your Own Container` Deployment in OCI Data Science to deploy the Llama3 model.

# Method 1: Export Model to Model Catalog

Follow the steps mentioned [here](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/model-deployment/containers/llama2/README.md#model-store-export-api-for-creating-model-artifacts-greater-than-6-gb-in-size)), refer the section One time download to OCI Model Catalog.

We would utilise the above created model in the next steps to create the Model Deployment.

# Method 2: Model-by-reference

Follow the steps to upload your model to Object Storage [here](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/LLM/llama3.1-8B-deployment-vLLM-container.md#upload-model-to-oci-object-storage)

Utilise the [section](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/LLM/llama3.1-8B-deployment-vLLM-container.md#create-model-by-reference-using-ads) on Create Model by Reference using ADS to create the model.

# ### Create Model deploy

* To deploy the model now in the console, navigate to your [OCI Data Science Project](https://cloud.oracle.com/data-science/project)
* Select the project created earlier and then select `Model Deployment`
* Click on `Create model deployment`
* Under `Default configuration` set following custom environment variables
* Key: `MODEL_DEPLOY_PREDICT_ENDPOINT`, Value: `/v1/completions`
* Key: `MODEL_DEPLOY_HEALTH_ENDPOINT`, Value: `/v1/health/ready`
* Key: `NIM_MODEL_NAME`, Value: `/opt/ds/model/deployed_model`
* Key: `NIM_SERVER_PORT`, Value `8080`
* Under `Models` click on the `Select` button and select the Model Catalog entry we created earlier
* Under `Compute` and then `Specialty and previous generation` select the `VM.GPU.A10.1` instance
* Under `Networking` choose the `Custom Networking` option and bring the VCN and subnet, which allows Internet access.
* Under `Logging` select the Log Group where you've created your predict and access log and select those correspondingly
* Click on `Show advanced options` at the bottom
* Select the checkbox `Use a custom container image`
* Select the OCIR repository and image we pushed earlier
* Use port 8080.
* Leave CMD and Entrypoint blank
* Click on `Create` button to create the model deployment
* Once the model is deployed and shown as `Active`, you can execute inference against it.
* Go to the model you've just deployed and click on it
* Under the left side under `Resources` select `Invoking your model`
* You will see the model endpoint under `Your model HTTP endpoint` copy it.
## Inference
```bash
oci raw-request \
--http-method POST \
--target-uri <MODEL-DEPLOY-ENDPOINT> \
--request-body '{"model": "/opt/ds/model/deployed_model", "messages": [ { "role":"user", "content":"Hello! How are you?" }, { "role":"assistant", "content":"Hi! I am quite well, how can I help you today?" }, { "role":"user", "content":"Can you write me a song?" } ], "top_p": 1, "n": 1, "max_tokens": 200, "stream": false, "frequency_penalty": 1.0, "stop": ["hello"] }' \
--auth resource_principal
```
## Troubleshooting
[Reference](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2#troubleshooting)
75 changes: 75 additions & 0 deletions model-deployment/containers/nim/README-SOURCE-NIM-TO-OCIR.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
<!-- ### Process to pull NIM image from Nvidia and push in to OCIR -->

## Part 1: Pull NIM image from Nvidia to your local machine

### Step 1: Get access to NIM image login to docker:

Register to developer account at https://build.nvidia.com/explore/discover. After logging in you can get the ngc token by going in the model page and clicking generate token button.

In place of password place your newly generated NGC token.

```
$ docker login nvcr.io
Username: $oauthtoken
Password: nvapi-6mj......
```

### Step 2: Pull image to your local machine:
```
Choose a container name for bookkeeping
$ export CONTAINER_NAME=llama3-8b-instruct
Define the vendor name for the LLM
$ export VENDOR_NAME=meta
$ export IMG_NAME="nvcr.io/nim/${VENDOR_NAME}/${CONTAINER_NAME}:latest"
$ docker pull $IMG_NAME
```

## Part 2: Push this image to ocir:

### Step 1: Login to ocir docker:

To get your registry domain check: https://docs.oracle.com/en-us/iaas/Content/Registry/Concepts/registryprerequisites.htm#regional-availability
If you dont have login details please check: https://docs.oracle.com/en-us/iaas/Content/Registry/Tasks/registrypushingimagesusingthedockercli.htm#:~:text=If%20you%20already%20have%20an%20auth%20token%2C%20go%20to%20the%20next%20step.%20Otherwise%3A

```
$ docker login <registry-domain>
```
When prompted for a username, enter your username in the format ```<tenancy-namespace>```/```<username>```, where ```<tenancy-namespace>``` is the auto-generated Object Storage namespace string of your tenancy (as shown on the Tenancy Information page).
For example, ansh81vru1zp/[email protected].
When prompted for a password, enter your auth token.



### Step 2: Locate the image on the client machine that you want to push:
```
$ docker images
```
```
Ex output for our image:
REPOSITORY TAG IMAGE ID
nvcr.io/nim/meta/llama3-8b-instruct latest 106df041c287
```

### Step 3: Tag the docker image
```
$ docker tag <image-identifier> <target-tag>
```

```<target-tag>``` is the fully qualified path to the target location in Container Registry where you want to push the image, in the format ```<registry-domain>/<tenancy-namespace>/<repo-name>:<version>```

```
Ex command:
$ docker tag 106df041c287 ocir.us-ashburn-1.oci.oraclecloud.com/ansh81vru1zp/project01/acme-web-app:v2.0.test
```

### Setp 4: push to ocir
```
$ docker push <target-tag>
```

Note: For more information on pushing image to OCIR refer: https://docs.oracle.com/en-us/iaas/Content/Registry/Tasks/registrypushingimagesusingthedockercli.htm
17 changes: 13 additions & 4 deletions model-deployment/containers/nim/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ This Readme walks through how to use NIM - [ Meta-Llama-3-8B-Instruct](https://h
* [llama3](https://github.com/meta-llama/llama3) from Meta.
* [NIM](https://catalog.ngc.nvidia.com/orgs/nim/teams/meta/containers/llama3-8b-instruct) by Nvidia

We describe two approaches to create this Model Deployment on OCI:
* Download Model using API-KEY from NGC Nvidia (described below)
* Utilising Object storage to store the model and creating a model catalog pointing to Object storage bucket [Refer](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/nim/README-MODEL-CATALOG.md)

## Prerequisites
* Access the corresponding NIM container for the model. For example for llama3, fetch the latest available image from [NGC catalog](https://catalog.ngc.nvidia.com/orgs/nim/teams/meta/containers/llama3-8b-instruct/tags). If you are a first time user, you need to sign up a developer account and wait for access to be granted to required container image.
Click Get Container Button and click Request Access for NIM. At the time of writing this blog, you need a business email address to get access to NIM.
Expand Down Expand Up @@ -36,6 +40,8 @@ When experimenting with new frameworks and models, it is highly advisable to att
docker build -f Dockerfile -t odsc-nim-llama3 .
```

##### To directly get image from Nvidia NIM catalogue and upload to OCIR check: ```./README-SOURCE-NIM-TO-OCIR.MD```

## OCI Container Registry

* You need to `docker login` to the Oracle Cloud Container Registry (OCIR) first, if you haven't done so before been able to push the image. To login, you have to use your [API Auth Token](https://docs.oracle.com/en-us/iaas/Content/Registry/Tasks/registrygettingauthtoken.htm) that can be created under your `Oracle Cloud Account->Auth Token`. You need to login only once.
Expand All @@ -51,13 +57,16 @@ When experimenting with new frameworks and models, it is highly advisable to att
```bash
docker push `odsc-nim-llama3:latest`
```


##### To directly get image from Nvidia NIM catalogue and upload to OCIR check: ```./README-SOURCE-NIM-TO-OCIR.MD```

## Deploy on OCI Data Science Model Deployment

Once you built and pushed the NIM container, you can now use the `Bring Your Own Container` Deployment in OCI Data Science to deploy the Llama3 model.
Once you built and pushed the NIM container, you can now use the `Bring Your Own Container` Deployment in OCI Data Science to deploy the Llama3 model

### Creating Model catalog
NIM container will download the model directly using publicly exposed NGC catalog APIs. To provide authorization token to download, we will save API key in a file and creaate a zip out of it. This zip file will then be used to create a model catalog resource.
NIM container will download the model directly using publicly exposed NGC catalog APIs. To provide authorization token to download, we will save API key in a file and create a zip out of it. This zip file will then be used to create a model catalog resource.
Sample file content named `token`:
```bash
nvapi-..........
Expand All @@ -66,8 +75,8 @@ This file will be available to container on location `/opt/ds/model/deployed_mod

### Create Model deploy

* To deploy the model now in the console, go back to your [OCI Data Science Project](https://cloud.oracle.com/data-science/project)
* Select the project you created earlier and then select `Model Deployment`
* To deploy the model now in the console, navigate to your [OCI Data Science Project](https://cloud.oracle.com/data-science/project)
* Select the project created earlier and then select `Model Deployment`
* Click on `Create model deployment`
* Under `Default configuration` set following custom environment variables
* Key: `MODEL_DEPLOY_PREDICT_ENDPOINT`, Value: `/v1/completions`
Expand Down

0 comments on commit b4937aa

Please sign in to comment.