Skip to content

Commit

Permalink
Merge pull request #508 from oracle-samples/qq/aqua
Browse files Browse the repository at this point in the history
Update AQUA model registration and deployment Tips
  • Loading branch information
qiuosier authored Nov 1, 2024
2 parents 6e93b97 + a19f1b4 commit 5e42c82
Show file tree
Hide file tree
Showing 6 changed files with 44 additions and 21 deletions.
43 changes: 27 additions & 16 deletions ai-quick-actions/model-deployment-tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,15 @@ replacement for applications using OpenAI API. Model deployments are a managed r
the OCI Data Science service. For more details about Model Deployment and managing it through
the OCI console please see the [OCI docs](https://docs.oracle.com/en-us/iaas/data-science/using/model-dep-about.htm).

### Deploying an LLM
## Deploying an LLM

After picking a model from the model explorer, if the "Deploy Model" is enabled you can use this
form to quickly deploy the model:

![Deploy Model](web_assets/deploy-model.png)

### Compute Shape

The compute shape selection is critical, the list available is selected to be suitable for the
chosen model.

Expand All @@ -40,16 +42,13 @@ For a full list of shapes and their definitions see the [compute shape docs](htt
The relationship between model parameter size and GPU memory is roughly 2x parameter count in GB, so for example a model that has 7B parameters will need a minimum of 14 GB for inference. At runtime the
memory is used for both holding the weights, along with the concurrent contexts for the user's requests.

The "inference mode" allows you to choose between the default completion endpoint(`/v1/completions`) and the chat endpoint (`/v1/chat/completions`).
* The default completion endpoint is designed for text completion tasks. It’s suitable for generating text based on a given prompt.
* The chat endpoint is tailored for chatbot-like interactions. It allows for more dynamic and interactive conversations by using a list of messages with roles (system, user, assistant). This is ideal for applications requiring back-and-forth dialogue, maintaining context over multiple turns. It is recommended that you deploy chat models (e.g. `meta-llama/Llama-3.1-8B-Instruct`) using the chat endpoint.
### Advanced Options

Once deployed, the model will spin up and become available after some time, then you're able to try out the model
from the deployments tab using the test model, or programmatically.
You may click on the "Show Advanced Options" to configure options for "inference container" and "inference mode".

![Try Model](web_assets/try-model.png)
![Advanced Options](web_assets/deploy-model-advanced-options.png)

### Advanced Deployment Options
### Inference Container Configuration

The service allows for model deployment configuration to be overridden when creating a model deployment. Depending on
the type of inference container used for deployment, i.e. vLLM or TGI, the parameters vary and need to be passed with the format
Expand All @@ -58,12 +57,24 @@ the type of inference container used for deployment, i.e. vLLM or TGI, the param
For more details, please visit [vLLM](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#command-line-arguments-for-the-server) or
[TGI](https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/launcher) documentation to know more about the parameters accepted by the respective containers.

![Model Deployment Parameters](web_assets/model-deployment-params.png)
### Inference Mode

The "inference mode" allows you to choose between the default completion endpoint(`/v1/completions`) and the chat endpoint (`/v1/chat/completions`).
* The default completion endpoint is designed for text completion tasks. It’s suitable for generating text based on a given prompt.
* The chat endpoint is tailored for chatbot-like interactions. It allows for more dynamic and interactive conversations by using a list of messages with roles (system, user, assistant). This is ideal for applications requiring back-and-forth dialogue, maintaining context over multiple turns. It is recommended that you deploy chat models (e.g. `meta-llama/Llama-3.1-8B-Instruct`) using the chat endpoint.


### Test Your Model

Once deployed, the model will spin up and become available after some time, then you're able to try out the model
from the deployments tab using the test model, or programmatically.

![Try Model](web_assets/try-model.png)


### Inferencing Model
## Inferencing Model

#### Using oci-cli
### Using oci-cli

```bash
oci raw-request --http-method POST --target-uri <model_deployment_url>/predict --request-body '{
Expand All @@ -78,7 +89,7 @@ oci raw-request --http-method POST --target-uri <model_deployment_url>/predict -
Note: Currently `oci-cli` does not support streaming response, use Python or Java SDK instead.


#### Using Python SDK (without streaming)
### Using Python SDK (without streaming)

```python
# The OCI SDK must be installed for this example to function properly.
Expand Down Expand Up @@ -120,7 +131,7 @@ res = requests.post(endpoint, json=body, auth=auth, headers={}).json()
print(res)
```

#### Using Python SDK (with streaming)
### Using Python SDK (with streaming)

To consume streaming Server-sent Events (SSE), install [sseclient-py](https://pypi.org/project/sseclient-py/) using `pip install sseclient-py`.

Expand Down Expand Up @@ -176,7 +187,7 @@ for event in client.events():
# print(line)
```

#### Using Java (with streaming)
### Using Java (with streaming)

```java
/**
Expand Down Expand Up @@ -304,7 +315,7 @@ public class RestExample {

```

### Advanced Configuration Update Options
## Advanced Configuration Update Options

The available shapes for models in AI Quick Actions are pre-configured for both registration and
deployment for models available in the Model Explorer. However, if you need to add more shapes to the list of
Expand Down Expand Up @@ -393,7 +404,7 @@ using [Advanced Deployment Options](#advanced-deployment-options) from AI Quick
```


### Troubleshooting
## Troubleshooting

If the model should fail to deploy, reasons might include lack of GPU availability, or policy permissions.

Expand Down
22 changes: 17 additions & 5 deletions ai-quick-actions/register-tips.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,25 @@ Table of Contents:
- [Model Evaluation](evaluation-tips.md)
- [Model Fine Tuning](fine-tuning-tips.md)

## Upload model artifact to Object Storage
The AI Quick Action model explorer allows you to register a model from Hugging Face or Object Storage with a few clicks.

AI Quick Actions supports user-provided models that can be deployed, fined-tuned and evaluated. You can now upload
and test models with artifacts downloaded from model repositories like Hugging Face, etc. or from your own models.
![Register new model](web_assets/register-button.png)

While registering the model in AI Quick Actions, you need to specify the Object Storage location where the model artifact is stored.
If you are downloading the model from the Hugging Face Hub, follow the download instructions [here](https://huggingface.co/docs/huggingface_hub/main/en/guides/download).
## Register Model from Hugging Face

To register a model from Hugging Face, select "Download from Hugging Face" in the dropdown under model artifact. Then you may select a verified model from the "Select Model" dropdown, or you may "register any model" by entering the model name.

Note that for gated models, please authenticate to Hugging Face by running `huggingface-cli login` command in terminal. See details in [Hugging Face CLI documentation](https://huggingface.co/docs/huggingface_hub/en/guides/cli).

![Register model from Hugging Face](web_assets/register-model.png)

## Upload Model Artifact to Object Storage Manually

AI Quick Actions also supports user-provided models that can be deployed, fined-tuned and evaluated.

While registering the model in AI Quick Actions, you need to specify the Object Storage location where the model artifact is stored.
You may first prepare your model files locally and then upload it to object storage.
For example, you can download the model from the Hugging Face Hub using the download instructions [here](https://huggingface.co/docs/huggingface_hub/main/en/guides/download).

Once downloaded, use [oci-cli](https://github.com/oracle/oci-cli) to upload these artifacts to the correct object storage location.
The object storage bucket needs to be versioned, run the following command to check whether versioning is set up. If the output of the below command is "Disabled", then you need
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified ai-quick-actions/web_assets/deploy-model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added ai-quick-actions/web_assets/register-button.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added ai-quick-actions/web_assets/register-model.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 5e42c82

Please sign in to comment.