diff --git a/ai-quick-actions/model-deployment-tips.md b/ai-quick-actions/model-deployment-tips.md index c8d9aecb..140f79a7 100644 --- a/ai-quick-actions/model-deployment-tips.md +++ b/ai-quick-actions/model-deployment-tips.md @@ -20,13 +20,15 @@ replacement for applications using OpenAI API. Model deployments are a managed r the OCI Data Science service. For more details about Model Deployment and managing it through the OCI console please see the [OCI docs](https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-about.htm). -### Deploying an LLM +## Deploying an LLM After picking a model from the model explorer, if the "Deploy Model" is enabled you can use this form to quickly deploy the model: ![Deploy Model](web_assets/deploy-model.png) +### Compute Shape + The compute shape selection is critical, the list available is selected to be suitable for the chosen model. @@ -40,16 +42,13 @@ For a full list of shapes and their definitions see the [compute shape docs](htt The relationship between model parameter size and GPU memory is roughly 2x parameter count in GB, so for example a model that has 7B parameters will need a minimum of 14 GB for inference. At runtime the memory is used for both holding the weights, along with the concurrent contexts for the user's requests. -The "inference mode" allows you to choose between the default completion endpoint(`/v1/completions`) and the chat endpoint (`/v1/chat/completions`). -* The default completion endpoint is designed for text completion tasks. It’s suitable for generating text based on a given prompt. -* The chat endpoint is tailored for chatbot-like interactions. It allows for more dynamic and interactive conversations by using a list of messages with roles (system, user, assistant). This is ideal for applications requiring back-and-forth dialogue, maintaining context over multiple turns. It is recommended that you deploy chat models (e.g. `meta-llama/Llama-3.1-8B-Instruct`) using the chat endpoint. +### Advanced Options -Once deployed, the model will spin up and become available after some time, then you're able to try out the model -from the deployments tab using the test model, or programmatically. +You may click on the "Show Advanced Options" to configure options for "inference container" and "inference mode". -![Try Model](web_assets/try-model.png) +![Advanced Options](web_assets/deploy-model-advanced-options.png) -### Advanced Deployment Options +### Inference Container Configuration The service allows for model deployment configuration to be overridden when creating a model deployment. Depending on the type of inference container used for deployment, i.e. vLLM or TGI, the parameters vary and need to be passed with the format @@ -58,12 +57,24 @@ the type of inference container used for deployment, i.e. vLLM or TGI, the param For more details, please visit [vLLM](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#command-line-arguments-for-the-server) or [TGI](https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/launcher) documentation to know more about the parameters accepted by the respective containers. -![Model Deployment Parameters](web_assets/model-deployment-params.png) +### Inference Mode + +The "inference mode" allows you to choose between the default completion endpoint(`/v1/completions`) and the chat endpoint (`/v1/chat/completions`). +* The default completion endpoint is designed for text completion tasks. It’s suitable for generating text based on a given prompt. +* The chat endpoint is tailored for chatbot-like interactions. It allows for more dynamic and interactive conversations by using a list of messages with roles (system, user, assistant). This is ideal for applications requiring back-and-forth dialogue, maintaining context over multiple turns. It is recommended that you deploy chat models (e.g. `meta-llama/Llama-3.1-8B-Instruct`) using the chat endpoint. + + +### Test Your Model + +Once deployed, the model will spin up and become available after some time, then you're able to try out the model +from the deployments tab using the test model, or programmatically. + +![Try Model](web_assets/try-model.png) -### Inferencing Model +## Inferencing Model -#### Using oci-cli +### Using oci-cli ```bash oci raw-request --http-method POST --target-uri /predict --request-body '{ @@ -78,7 +89,7 @@ oci raw-request --http-method POST --target-uri /predict - Note: Currently `oci-cli` does not support streaming response, use Python or Java SDK instead. -#### Using Python SDK (without streaming) +### Using Python SDK (without streaming) ```python # The OCI SDK must be installed for this example to function properly. @@ -120,7 +131,7 @@ res = requests.post(endpoint, json=body, auth=auth, headers={}).json() print(res) ``` -#### Using Python SDK (with streaming) +### Using Python SDK (with streaming) To consume streaming Server-sent Events (SSE), install [sseclient-py](https://pypi.org/project/sseclient-py/) using `pip install sseclient-py`. @@ -176,7 +187,7 @@ for event in client.events(): # print(line) ``` -#### Using Java (with streaming) +### Using Java (with streaming) ```java /** @@ -304,7 +315,7 @@ public class RestExample { ``` -### Advanced Configuration Update Options +## Advanced Configuration Update Options The available shapes for models in AI Quick Actions are pre-configured for both registration and deployment for models available in the Model Explorer. However, if you need to add more shapes to the list of @@ -393,7 +404,7 @@ using [Advanced Deployment Options](#advanced-deployment-options) from AI Quick ``` -### Troubleshooting +## Troubleshooting If the model should fail to deploy, reasons might include lack of GPU availability, or policy permissions. diff --git a/ai-quick-actions/register-tips.md b/ai-quick-actions/register-tips.md index f0a3f520..040be511 100644 --- a/ai-quick-actions/register-tips.md +++ b/ai-quick-actions/register-tips.md @@ -9,13 +9,25 @@ Table of Contents: - [Model Evaluation](evaluation-tips.md) - [Model Fine Tuning](fine-tuning-tips.md) -## Upload model artifact to Object Storage +The AI Quick Action model explorer allows you to register a model from Hugging Face or Object Storage with a few clicks. -AI Quick Actions supports user-provided models that can be deployed, fined-tuned and evaluated. You can now upload -and test models with artifacts downloaded from model repositories like Hugging Face, etc. or from your own models. +![Register new model](web_assets/register-button.png) -While registering the model in AI Quick Actions, you need to specify the Object Storage location where the model artifact is stored. -If you are downloading the model from the Hugging Face Hub, follow the download instructions [here](https://huggingface.co/docs/huggingface_hub/main/en/guides/download). +## Register Model from Hugging Face + +To register a model from Hugging Face, select "Download from Hugging Face" in the dropdown under model artifact. Then you may select a verified model from the "Select Model" dropdown, or you may "register any model" by entering the model name. + +Note that for gated models, please authenticate to Hugging Face by running `huggingface-cli login` command in terminal. See details in [Hugging Face CLI documentation](https://huggingface.co/docs/huggingface_hub/en/guides/cli). + +![Register model from Hugging Face](web_assets/register-model.png) + +## Upload Model Artifact to Object Storage Manually + +AI Quick Actions also supports user-provided models that can be deployed, fined-tuned and evaluated. + +While registering the model in AI Quick Actions, you need to specify the Object Storage location where the model artifact is stored. +You may first prepare your model files locally and then upload it to object storage. +For example, you can download the model from the Hugging Face Hub using the download instructions [here](https://huggingface.co/docs/huggingface_hub/main/en/guides/download). Once downloaded, use [oci-cli](https://github.com/oracle/oci-cli) to upload these artifacts to the correct object storage location. The object storage bucket needs to be versioned, run the following command to check whether versioning is set up. If the output of the below command is "Disabled", then you need diff --git a/ai-quick-actions/web_assets/deploy-model-advanced-options.png b/ai-quick-actions/web_assets/deploy-model-advanced-options.png new file mode 100644 index 00000000..7e7feef1 Binary files /dev/null and b/ai-quick-actions/web_assets/deploy-model-advanced-options.png differ diff --git a/ai-quick-actions/web_assets/deploy-model.png b/ai-quick-actions/web_assets/deploy-model.png index bccd6619..7aad87b9 100644 Binary files a/ai-quick-actions/web_assets/deploy-model.png and b/ai-quick-actions/web_assets/deploy-model.png differ diff --git a/ai-quick-actions/web_assets/register-button.png b/ai-quick-actions/web_assets/register-button.png new file mode 100644 index 00000000..74af1e30 Binary files /dev/null and b/ai-quick-actions/web_assets/register-button.png differ diff --git a/ai-quick-actions/web_assets/register-model.png b/ai-quick-actions/web_assets/register-model.png new file mode 100644 index 00000000..bbb312ff Binary files /dev/null and b/ai-quick-actions/web_assets/register-model.png differ