The helm chart provided here can be used to install OpenVINO Model Server in a Kubernetes cluster. It has the same parameters as the operator so it could be used directly as an alternative deployment method. The helm chart is managing the Model Server instance which represents a kubernetes deployment and a kubernetes service with exposed REST and gRPC inference endpoints. This guide assumes you already have a functional Kubernetes cluster and helm installed (see below for instructions on installing helm). The steps below describe how to setup a model repository, use helm to launch the inference server and then send inference requests to the running server.
Please refer to Helm installation guide.
Deploy Model Server using helm. Please include the required model name and model path. You can also adjust other parameters defined in values.yaml.
helm install ovms-app ovms --set models_settings.model_name=<model_name>,models_settings.model_path=gs://<bucket>/<model>
Use kubectl to see the status and wait until the Model Server pod is running:
kubectl get pods
NAME READY STATUS RESTARTS AGE
ovms-app-5fd8d6b845-w87jl 1/1 Running 0 27s
By default, Model Server is deployed with 1 instance. If you would like to scale up additional replicas, override the value in values.yaml file or by passing --set flag to helm install:
helm install ovms-app ovms --set models_settings.model_name=<model_name>,models_settings.model_path=gs://<bucket>/<model>,deployment_parameters.replicas=3
To serve multiple models, you can run Model Server with a configuration file as described in model server documentation. Follow the above documentation to create a configuration file named config.json and fill it with proper information. To deploy with config file stored in the Kubernetes ConfigMap:
- create a ConfigMap resource from this file with a chosen name (here ovms-config):
kubectl create configmap ovms-config --from-file config.json
- deploy Model Server with parameters
models_settings.single_model_mode
andmodels_settings.config_configmap_name
(withoutmodel_name
andmodel_path
):
helm install ovms-app ovms --set models_settings.config_configmap_name=ovms-config,models_settings.single_model_mode=false
Bucket permissions can be set with the GOOGLE_APPLICATION_CREDENTIALS environment variable. Please follow the steps below:
- Generate Google service account JSON file with permissions: Storage Legacy Bucket Reader, Storage Legacy Object Reader, Storage Object Viewer. Name a file for example: gcp-creds.json (you can follow these instructions to create a Service Account and download JSON)
- Create a Kubernetes secret from this JSON file:
kubectl create secret generic gcpcreds --from-file gcp-creds.json
- When deploying Model Server, provide the model path to GCS bucket and name for the secret created above. Make sure to provide
gcp_creds_secret_name
when deploying:
helm install ovms-app ovms --set models_settings.model_name=<model_name>,models_settings.model_path=gs://<bucket>y/<model>,models_repository.gcp_creds_secret_name=gcpcreds
For S3 you must provide an AWS Access Key ID, the content of that key (AWS Secret Access Key) and the AWS region when deploying: aws_access_key_id
, aws_secret_access_key
and aws_region
(see below).
helm install ovms-app ovms --set models_settings.model_name=<model_name>,models_settings.model_path=s3://<bucket>/<model>,models_repository.aws_access_key_id=<...>,models_repository.aws_secret_access_key=<...>,models_repository.aws_region=<...>
In case you would like to use custom S3 service with compatible API (e.g. MinIO), you need to also provide endpoint
to that service. Please provide it by supplying s3_compat_api_endpoint
:
helm install ovms-app ovms --set models_settings.model_name=icnet-camvid-ava-0001,models_settings.model_path=s3://<bucket>/<model>,models_repository.aws_access_key_id=<...>,models_repository.aws_secret_access_key=<...>,models_repository.s3_compat_api_endpoint=<...>
Use OVMS with models stored on azure blob storage by providing azure_storage_connection_string
parameter. Model path should follow az
scheme like below:
helm install ovms-app ovms --set models_settings.model_name=resnet,models_settings.model_path=az://<container>/<model_path>,models_repository.azure_storage_connection_string="DefaultEndpointsProtocol=https;AccountName=azure_account_name;AccountKey=smp/hashkey==;EndpointSuffix=core.windows.net"
Beside the cloud storage, models could be stored locally on the kubernetes nodes filesystem.
Use the parameter models_repository.models_host_path
with the local path on the nodes. It will be mounted in the OVMS container as /models
folder.
While the models folder is mounted in the OVMS container, the parameter models_settings.model_path
should refer to the path starting with /models/... and point to the folder with the model versions.
Note that the OVMS container starts, by default, with the security context of account ovms
with pid 5000 and group 5000.
If the mounted models have restricted access permissions, change the security context of the OVMS service or adjust permissions to the models. OVMS requires read permissions on the model files and
list permission on the model version folders.
It is possible to deploy OVMS using Kubernetes persistent volumes.
That opens a possibility of storing the models for OVMS on all Kubernetes supported filesystems.
In the helm set the parameter models_repository.models_volume_claim
with the name of the PersistentVolumeClaim
record with the models. While set, it will be mounted as /models
folder inside the OVMS container.
Note that parameter models_repository.models_volume_claim
is mutually exclusive with models_repository.models_host_path
. Only one of them should be set.
By default, there are no restrictions, but can restrict assigned cluster resources to the OVMS container by setting the parameters:
deployment_parameters.resources.limits.cpu
- maximum cpu allocationdeployment_parameters.resources.limits.memory
- maximum memory allocationdeployment_parameters.resources.limits.xpu_device
- accelerator name like configured in the device plugindeployment_parameters.resources.limits.xpu_device_quantity
- number of acceleratorsdeployment_parameters.resources.requests.cpu
- reserved cpu allocationdeployment_parameters.resources.requests.memory
- reserved memory allocationdeployment_parameters.resources.requests.xpu_device
- accelerator name like configured in the device plugin - should be the same like set in limits or emptydeployment_parameters.resources.requests.xpu_device_quantity
- number of accelerators - should be the same like set in limits or empty Below is the snippet example from the helm chart values.yaml file:
deployment_parameters:
resources:
limits:
cpu: 8.0
memory: 512Mi
Beside setting the CPU and memory resources, the same parameter can be used to assign AI accelerators like iGPU, or VPU. That assumes using adequate Kubernetes device plugin from Intel Device Plugin for Kubernetes.
deployment_parameters:
resources:
limits:
xpu_device: gpu.intel.com/i915
xpu_device_quantity: "1"
OVMS, by default, starts with the security context of ovms
account which has the pid 5000 and gid 5000. In some cases, it can prevent importing models
stored on the file system with restricted access.
It might require adjusting the security context of OVMS service. It can be changed using the parameters models_repository.runAsUser
and models_repository.runAsGroup
.
An example of the values is presented below:
models_repository:
runAsUser: 5000
runAsGroup: 5000
The security configuration could be also adjusted further with all options specified in Kubernetes documentation
The helm chart creates the Kubernetes service
as part of the OVMS deployment. Depending on the cluster infrastructure you can adjust
the service type.
In the cloud environment you might set LoadBalancer
type to expose the service externally. NodePort
could expose a static port
of the node IP address. ClusterIP
would keep the OVMS service internal to the cluster applications.
In this demonstration, it is assumed there is available a Kubernetes or OpenShift cluster with configured security context in the KUBECONFIG. Helm 3 binary and kubectl 1.23 should be also installed to run the commands. An exemplary model server instance with a public ResNet model can be deployed via a commands:
git clone https://github.com/openvinotoolkit/operator
cd operator/helm-charts
helm install ovms-app ovms --set models_settings.model_name=resnet,models_settings.model_path=gs://<bucket_name>/<model_dir>
Now that the server is running you can send HTTP or gRPC requests to perform inference.
By default, the service is exposed with a ClusterIP
service type.
kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ovms-app ClusterIP 10.98.164.11 <none> 8080/TCP,8081/TCP 5m30s
The server exposes an gRPC endpoint on 8080 port and REST endpoint on 8081 port.
The service name deployed via the helm chart is defined by the application name. In addition to that, the service
gets a suffix -ovms
, in case the application name doesn't include ovms
phrase. It avoids a risk of the service name
conflicts with other application.
Below is described an example how the model service can be used inside the cluster.
Make an interactive session on the docker container with python installed:
kubectl create deployment client-test --image=python:3.8.13 -- sleep infinity
kubectl exec -it $(kubectl get pod -o jsonpath="{.items[0].metadata.name}" -l app=client-test) -- bash
REST API response can be verified inside the client container with a simple curl
command listing the served models:
curl http://ovms-app:8081/v1/config
{
"resnet" :
{
"model_version_status": [
{
"version": "1",
"state": "AVAILABLE",
"status": {
"error_code": "OK",
"error_message": "OK"
}
}
]
}
You can also test a prediction via gRPC interface. Inside the containers run the following commands to install the client package and download an image to classify:
pip install ovmsclient
wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/static/images/bee.jpeg
Create a python script with a basic client content:
cat >> predict.py <<EOL
from ovmsclient import make_grpc_client
import numpy as np
client = make_grpc_client("ovms-app:8080")
with open("bee.jpeg", "rb") as f:
data = f.read()
inputs = {"map/TensorArrayStack/TensorArrayGatherV3:0": data}
results = client.predict(inputs=inputs, model_name="resnet")
print("Detected class:", np.argmax(results))
EOL
Run the prediction via a command:
python client.py
Detected class: 310
Class 310 represents a bee in the Imagenet dataset.
This demonstration deploys the model server serving a directed acyclic graph with vehicle analysis in Kubernetes. Requirements:
- Kubernetes or OpenShift cluster with configured security context in the KUBECONFIG
- helm 3
- kubectl 1.23
- mc binary and access to S3 compatible bucket - quick start with Minio
If you don't have a minio in place, you can move forward with simple, standalone setup. Run:
kubectl apply -f https://raw.githubusercontent.com/openvinotoolkit/operator/fix_helm/helm-charts/ovms/minio-standalone.yaml
Prepare all dependencies for the pipeline with a vehicle analysis pipelines:
git clone https://github.com/openvinotoolkit/model_server
cd model_server/demos/vehicle_analysis_pipeline/python
make
The command above downloads the models and builds the customer library for the pipeline and places them in workspace folder. Copy the models to the shared storage accessible in the cluster. Here the S3 server alias is mys3
:
mc mb mys3/models-repository
mc cp --recursive workspace/vehicle-detection-0202 mys3/models-repository/
mc cp --recursive workspace/vehicle-attributes-recognition-barrier-0042 mys3/models-repository/
mc ls -r mys3
43MiB models-repository/vehicle-attributes-recognition-barrier-0042/1/vehicle-attributes-recognition-barrier-0042.bin
118KiB models-repository/vehicle-attributes-recognition-barrier-0042/1/vehicle-attributes-recognition-barrier-0042.xml
7.1MiB models-repository/vehicle-detection-0202/1/vehicle-detection-0202.bin
331KiB models-repository/vehicle-detection-0202/1/vehicle-detection-0202.xml
In the initially created model server config file workspace/config.json
, several adjustments are needed to change the models and custom node library base paths.
Commands below set the models path to S3 bucket.
sed -i 's/\/workspace\/vehicle-detection-0202/s3:\/\/models-repository\/vehicle-detection-0202/g' workspace/config.json
sed -i 's/\/workspace\/vehicle-attributes-recognition-barrier-0042/s3:\/\/models-repository\/vehicle-attributes-recognition-barrier-0042/g' workspace/config.json
Next, add the config file to a config map:
kubectl create configmap ovms-pipeline --from-file=config.json=workspace/config.json
From the context of the helm chart folder in the operator repo deploy the model server. Change the credentials and S3 endpoint as needed in your environment:
git clone https://github.com/openvinotoolkit/operator
cd operator/helm-charts
export AWS_ACCESS_KEY_ID=minioadmin
export AWS_SECRET_ACCESS_KEY=minioadmin
export AWS_REGION=us-east-1
export S3_COMPAT_API_ENDPOINT=http://minio-service:9000
helm install ovms-pipeline ovms --set models_settings.config_configmap_name=ovms-pipeline,models_settings.single_model_mode=false,models_repository.aws_access_key_id=$AWS_ACCESS_KEY_ID,models_repository.aws_secret_access_key=$AWS_SECRET_ACCESS_KEY,models_repository.aws_region=us-east-1,models_repository.s3_compat_api_endpoint=$S3_COMPAT_API_ENDPOINT
$ kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
ovms-pipeline ClusterIP 10.99.53.175 <none> 8080/TCP,8081/TCP 26m
Now we are ready to test the pipeline from the client container. Make an interactive session on the docker container with python installed:
kubectl create deployment client-test --image=python:3.8.13 -- sleep infinity
kubectl exec -it $(kubectl get pod -o jsonpath="{.items[0].metadata.name}" -l app=client-test) -- bash
Inside the containers run the following commands to install the client package and download an image to classify:
pip install ovmsclient
wget https://raw.githubusercontent.com/openvinotoolkit/model_server/main/demos/common/static/images/cars/road1.jpg
Create a python script with a basic client content:
cat >> pipeline.py <<EOL
from ovmsclient import make_grpc_client
import numpy as np
client = make_grpc_client("ovms-pipeline:8080")
with open("road1.jpg", "rb") as f:
data = f.read()
inputs = {"image": data}
results = client.predict(inputs=inputs, model_name="multiple_vehicle_recognition")
print("Returned outputs:",results.keys())
EOL
Run the prediction via a command:
$ python pipeline.py
Returned outputs: dict_keys(['colors', 'vehicle_coordinates', 'types', 'vehicle_images', 'confidence_levels'])
Once you've finished using the server you should use helm to uninstall the chart:
$ helm ls
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
ovms-app default 1 2022-04-11 13:39:44.11018803 +0200 CEST deployed ovms-4.0.0
ovms-pipeline default 1 2022-04-11 15:12:28.279846055 +0200 CEST deployed ovms-4.0.0
$ helm uninstall ovms-app
release "ovms-app" uninstalled
$ helm uninstall ovms-pipeline
release "ovms-pipeline" uninstalled
Check also: