-
Notifications
You must be signed in to change notification settings - Fork 168
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #478 from Gaurav141199/nim_model_deployment_update
README for NIM Model Deployment
- Loading branch information
Showing
3 changed files
with
163 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
# Overview | ||
|
||
Utilising Model Catalog to store Models in OCI. We describe two ways to achieve this: | ||
|
||
* Storing zipped model file in Model Catalog | ||
* Utilising Object storage to store the model and creating a model catalog pointing to Object storage bucket [Refer](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/nim/README-MODEL-CATALOG.md) | ||
|
||
# Pre-requisite | ||
|
||
The following are the pre-requisite: | ||
* Notebook session with internet access (Recommended) | ||
* Download the Llama 3 8B Instruct Model from [HuggingFace](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) or NGC repository. | ||
|
||
## Download NIM Container image and upload to OCIR | ||
* Pull the latest NIM Image to local machine. Tag it with desired name. | ||
```bash | ||
docker pull nvcr.io/nim/meta/llama3-8b-instruct:latest | ||
docker tag nvcr.io/nim/meta/llama3-8b-instruct:latest odsc-nim-llama3:latest | ||
``` | ||
## OCI Container Registry | ||
|
||
Once NIM container is pushed, you can now use the `Bring Your Own Container` Deployment in OCI Data Science to deploy the Llama3 model. | ||
|
||
# Method 1: Export Model to Model Catalog | ||
|
||
Follow the steps mentioned [here](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/model-deployment/containers/llama2/README.md#model-store-export-api-for-creating-model-artifacts-greater-than-6-gb-in-size)), refer the section One time download to OCI Model Catalog. | ||
|
||
We would utilise the above created model in the next steps to create the Model Deployment. | ||
|
||
# Method 2: Model-by-reference | ||
|
||
Follow the steps to upload your model to Object Storage [here](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/LLM/llama3.1-8B-deployment-vLLM-container.md#upload-model-to-oci-object-storage) | ||
|
||
Utilise the [section](https://github.com/oracle-samples/oci-data-science-ai-samples/blob/main/LLM/llama3.1-8B-deployment-vLLM-container.md#create-model-by-reference-using-ads) on Create Model by Reference using ADS to create the model. | ||
|
||
# ### Create Model deploy | ||
|
||
* To deploy the model now in the console, navigate to your [OCI Data Science Project](https://cloud.oracle.com/data-science/project) | ||
* Select the project created earlier and then select `Model Deployment` | ||
* Click on `Create model deployment` | ||
* Under `Default configuration` set following custom environment variables | ||
* Key: `MODEL_DEPLOY_PREDICT_ENDPOINT`, Value: `/v1/completions` | ||
* Key: `MODEL_DEPLOY_HEALTH_ENDPOINT`, Value: `/v1/health/ready` | ||
* Key: `NIM_MODEL_NAME`, Value: `/opt/ds/model/deployed_model` | ||
* Key: `NIM_SERVER_PORT`, Value `8080` | ||
* Under `Models` click on the `Select` button and select the Model Catalog entry we created earlier | ||
* Under `Compute` and then `Specialty and previous generation` select the `VM.GPU.A10.1` instance | ||
* Under `Networking` choose the `Custom Networking` option and bring the VCN and subnet, which allows Internet access. | ||
* Under `Logging` select the Log Group where you've created your predict and access log and select those correspondingly | ||
* Click on `Show advanced options` at the bottom | ||
* Select the checkbox `Use a custom container image` | ||
* Select the OCIR repository and image we pushed earlier | ||
* Use port 8080. | ||
* Leave CMD and Entrypoint blank | ||
* Click on `Create` button to create the model deployment | ||
* Once the model is deployed and shown as `Active`, you can execute inference against it. | ||
* Go to the model you've just deployed and click on it | ||
* Under the left side under `Resources` select `Invoking your model` | ||
* You will see the model endpoint under `Your model HTTP endpoint` copy it. | ||
## Inference | ||
```bash | ||
oci raw-request \ | ||
--http-method POST \ | ||
--target-uri <MODEL-DEPLOY-ENDPOINT> \ | ||
--request-body '{"model": "/opt/ds/model/deployed_model", "messages": [ { "role":"user", "content":"Hello! How are you?" }, { "role":"assistant", "content":"Hi! I am quite well, how can I help you today?" }, { "role":"user", "content":"Can you write me a song?" } ], "top_p": 1, "n": 1, "max_tokens": 200, "stream": false, "frequency_penalty": 1.0, "stop": ["hello"] }' \ | ||
--auth resource_principal | ||
``` | ||
## Troubleshooting | ||
[Reference](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/model-deployment/containers/llama2#troubleshooting) | ||
75 changes: 75 additions & 0 deletions
75
model-deployment/containers/nim/README-SOURCE-NIM-TO-OCIR.MD
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
<!-- ### Process to pull NIM image from Nvidia and push in to OCIR --> | ||
|
||
## Part 1: Pull NIM image from Nvidia to your local machine | ||
|
||
### Step 1: Get access to NIM image login to docker: | ||
|
||
Register to developer account at https://build.nvidia.com/explore/discover. After logging in you can get the ngc token by going in the model page and clicking generate token button. | ||
|
||
In place of password place your newly generated NGC token. | ||
|
||
``` | ||
$ docker login nvcr.io | ||
Username: $oauthtoken | ||
Password: nvapi-6mj...... | ||
``` | ||
|
||
### Step 2: Pull image to your local machine: | ||
``` | ||
Choose a container name for bookkeeping | ||
$ export CONTAINER_NAME=llama3-8b-instruct | ||
Define the vendor name for the LLM | ||
$ export VENDOR_NAME=meta | ||
$ export IMG_NAME="nvcr.io/nim/${VENDOR_NAME}/${CONTAINER_NAME}:latest" | ||
$ docker pull $IMG_NAME | ||
``` | ||
|
||
## Part 2: Push this image to ocir: | ||
|
||
### Step 1: Login to ocir docker: | ||
|
||
To get your registry domain check: https://docs.oracle.com/en-us/iaas/Content/Registry/Concepts/registryprerequisites.htm#regional-availability | ||
If you dont have login details please check: https://docs.oracle.com/en-us/iaas/Content/Registry/Tasks/registrypushingimagesusingthedockercli.htm#:~:text=If%20you%20already%20have%20an%20auth%20token%2C%20go%20to%20the%20next%20step.%20Otherwise%3A | ||
|
||
``` | ||
$ docker login <registry-domain> | ||
``` | ||
When prompted for a username, enter your username in the format ```<tenancy-namespace>```/```<username>```, where ```<tenancy-namespace>``` is the auto-generated Object Storage namespace string of your tenancy (as shown on the Tenancy Information page). | ||
For example, ansh81vru1zp/[email protected]. | ||
When prompted for a password, enter your auth token. | ||
|
||
|
||
|
||
### Step 2: Locate the image on the client machine that you want to push: | ||
``` | ||
$ docker images | ||
``` | ||
``` | ||
Ex output for our image: | ||
REPOSITORY TAG IMAGE ID | ||
nvcr.io/nim/meta/llama3-8b-instruct latest 106df041c287 | ||
``` | ||
|
||
### Step 3: Tag the docker image | ||
``` | ||
$ docker tag <image-identifier> <target-tag> | ||
``` | ||
|
||
```<target-tag>``` is the fully qualified path to the target location in Container Registry where you want to push the image, in the format ```<registry-domain>/<tenancy-namespace>/<repo-name>:<version>``` | ||
|
||
``` | ||
Ex command: | ||
$ docker tag 106df041c287 ocir.us-ashburn-1.oci.oraclecloud.com/ansh81vru1zp/project01/acme-web-app:v2.0.test | ||
``` | ||
|
||
### Setp 4: push to ocir | ||
``` | ||
$ docker push <target-tag> | ||
``` | ||
|
||
Note: For more information on pushing image to OCIR refer: https://docs.oracle.com/en-us/iaas/Content/Registry/Tasks/registrypushingimagesusingthedockercli.htm |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters