Merge pull request #508 from oracle-samples/qq/aqua

Update AQUA model registration and deployment Tips
oracle-samples · Nov 1, 2024 · 5e42c82 · 5e42c82
2 parents 6e93b97 + a19f1b4
commit 5e42c82
Show file tree

Hide file tree

Showing 6 changed files with 44 additions and 21 deletions.
diff --git a/ai-quick-actions/model-deployment-tips.md b/ai-quick-actions/model-deployment-tips.md
@@ -20,13 +20,15 @@ replacement for applications using OpenAI API. Model deployments are a managed r
 the OCI Data Science service. For more details about Model Deployment and managing it through 
 the OCI console please see the [OCI docs](https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-about.htm).
 
-### Deploying an LLM
+## Deploying an LLM
 
 After picking a model from the model explorer, if the "Deploy Model" is enabled you can use this
 form to quickly deploy the model:
 
 ![Deploy Model](web_assets/deploy-model.png)
 
+### Compute Shape
+
 The compute shape selection is critical, the list available is selected to be suitable for the 
 chosen model.
 
@@ -40,16 +42,13 @@ For a full list of shapes and their definitions see the [compute shape docs](htt
 The relationship between model parameter size and GPU memory is roughly 2x parameter count in GB, so for example a model that has 7B parameters will need a minimum of 14 GB for inference. At runtime the
 memory is used for both holding the weights, along with the concurrent contexts for the user's requests.
 
-The "inference mode" allows you to choose between the default completion endpoint(`/v1/completions`) and the chat endpoint (`/v1/chat/completions`).
-* The default completion endpoint is designed for text completion tasks. It’s suitable for generating text based on a given prompt.
-* The chat endpoint is tailored for chatbot-like interactions. It allows for more dynamic and interactive conversations by using a list of messages with roles (system, user, assistant). This is ideal for applications requiring back-and-forth dialogue, maintaining context over multiple turns. It is recommended that you deploy chat models (e.g. `meta-llama/Llama-3.1-8B-Instruct`) using the chat endpoint.
+### Advanced Options
 
-Once deployed, the model will spin up and become available after some time, then you're able to try out the model 
-from the deployments tab using the test model, or programmatically.
+You may click on the "Show Advanced Options" to configure options for "inference container" and "inference mode".
 
-![Try Model](web_assets/try-model.png)
+![Advanced Options](web_assets/deploy-model-advanced-options.png)
 
-### Advanced Deployment Options
+### Inference Container Configuration
 
 The service allows for model deployment configuration to be overridden when creating a model deployment. Depending on 
 the type of inference container used for deployment, i.e. vLLM or TGI, the parameters vary and need to be passed with the format 
@@ -58,12 +57,24 @@ the type of inference container used for deployment, i.e. vLLM or TGI, the param
 For more details, please visit [vLLM](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#command-line-arguments-for-the-server) or 
 [TGI](https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/launcher) documentation to know more about the parameters accepted by the respective containers. 
 
-![Model Deployment Parameters](web_assets/model-deployment-params.png)
+### Inference Mode
+
+The "inference mode" allows you to choose between the default completion endpoint(`/v1/completions`) and the chat endpoint (`/v1/chat/completions`).
+* The default completion endpoint is designed for text completion tasks. It’s suitable for generating text based on a given prompt.
+* The chat endpoint is tailored for chatbot-like interactions. It allows for more dynamic and interactive conversations by using a list of messages with roles (system, user, assistant). This is ideal for applications requiring back-and-forth dialogue, maintaining context over multiple turns. It is recommended that you deploy chat models (e.g. `meta-llama/Llama-3.1-8B-Instruct`) using the chat endpoint.
+
+
+### Test Your Model
+
+Once deployed, the model will spin up and become available after some time, then you're able to try out the model 
+from the deployments tab using the test model, or programmatically.
+
+![Try Model](web_assets/try-model.png)
 
 
-### Inferencing Model
+## Inferencing Model
 
-#### Using oci-cli
+### Using oci-cli
 
 ```bash
 oci raw-request --http-method POST --target-uri <model_deployment_url>/predict --request-body '{
@@ -78,7 +89,7 @@ oci raw-request --http-method POST --target-uri <model_deployment_url>/predict -
 Note: Currently `oci-cli` does not support streaming response, use Python or Java SDK instead.
 
 
-#### Using Python SDK (without streaming)
+### Using Python SDK (without streaming)
 
 ```python
 # The OCI SDK must be installed for this example to function properly.
@@ -120,7 +131,7 @@ res = requests.post(endpoint, json=body, auth=auth, headers={}).json()
 print(res)
 ```
 
-#### Using Python SDK (with streaming)
+### Using Python SDK (with streaming)
 
 To consume streaming Server-sent Events (SSE), install [sseclient-py](https://pypi.org/project/sseclient-py/) using `pip install sseclient-py`.
 
@@ -176,7 +187,7 @@ for event in client.events():
 #        print(line)
 ```
 
-#### Using Java (with streaming)
+### Using Java (with streaming)
 
 ```java
 /**
@@ -304,7 +315,7 @@ public class RestExample {
 
 ```
 
-### Advanced Configuration Update Options
+## Advanced Configuration Update Options
 
 The available shapes for models in AI Quick Actions are pre-configured for both registration and 
 deployment for models available in the Model Explorer. However, if you need to add more shapes to the list of 
@@ -393,7 +404,7 @@ using [Advanced Deployment Options](#advanced-deployment-options) from AI Quick
 ```
 
 
-### Troubleshooting
+## Troubleshooting
 
 If the model should fail to deploy, reasons might include lack of GPU availability, or policy permissions.
 

diff --git a/ai-quick-actions/register-tips.md b/ai-quick-actions/register-tips.md
@@ -9,13 +9,25 @@ Table of Contents:
 - [Model Evaluation](evaluation-tips.md)
 - [Model Fine Tuning](fine-tuning-tips.md)
 
-## Upload model artifact to Object Storage
+The AI Quick Action model explorer allows you to register a model from Hugging Face or Object Storage with a few clicks.
 
-AI Quick Actions supports user-provided models that can be deployed, fined-tuned and evaluated. You can now upload 
-and test models with artifacts downloaded from model repositories like Hugging Face, etc. or from your own models.
+![Register new model](web_assets/register-button.png)
 
-While registering the model in AI Quick Actions, you need to specify the Object Storage location where the model artifact is stored. 
-If you are downloading the model from the Hugging Face Hub, follow the download instructions [here](https://huggingface.co/docs/huggingface_hub/main/en/guides/download).
+## Register Model from Hugging Face
+
+To register a model from Hugging Face, select "Download from Hugging Face" in the dropdown under model artifact. Then you may select a verified model from the "Select Model" dropdown, or you may "register any model" by entering the model name.
+
+Note that for gated models, please authenticate to Hugging Face by running `huggingface-cli login` command in terminal. See details in [Hugging Face CLI documentation](https://huggingface.co/docs/huggingface_hub/en/guides/cli).
+
+![Register model from Hugging Face](web_assets/register-model.png)
+
+## Upload Model Artifact to Object Storage Manually
+
+AI Quick Actions also supports user-provided models that can be deployed, fined-tuned and evaluated.
+
+While registering the model in AI Quick Actions, you need to specify the Object Storage location where the model artifact is stored.
+You may first prepare your model files locally and then upload it to object storage.
+For example, you can download the model from the Hugging Face Hub using the download instructions [here](https://huggingface.co/docs/huggingface_hub/main/en/guides/download).
 
 Once downloaded, use [oci-cli](https://github.com/oracle/oci-cli) to upload these artifacts to the correct object storage location. 
 The object storage bucket needs to be versioned, run the following command to check whether versioning is set up. If the output of the below command is "Disabled", then you need

diff --git a/ai-quick-actions/web_assets/deploy-model-advanced-options.png b/ai-quick-actions/web_assets/deploy-model-advanced-options.png
diff --git a/ai-quick-actions/web_assets/deploy-model.png b/ai-quick-actions/web_assets/deploy-model.png
diff --git a/ai-quick-actions/web_assets/register-button.png b/ai-quick-actions/web_assets/register-button.png
diff --git a/ai-quick-actions/web_assets/register-model.png b/ai-quick-actions/web_assets/register-model.png