In this tutorial, we will build on our previous deployment tutorial, augmenting our cluster with ingress (technical word for internet traffic entering out system) and adding the ability to conduct canary deployments.
When we deploy ML models, we are often unsure of the performance of our model in the real world. Methods like A/B testing and canary deployments allow us to evaluate the performance of our model and avoid erroneous deployments where the model is not performing as expected. This is done by splitting traffic between two ML services, typically your baseline model and a new model. As you become confident in the newly deployed service, you can increase the traffic to the new model and gradually reduce the traffic to the baseline model. This ensures a smooth deployment to your users with no downtime and interruption.
We are going to use two core technologies to implement canary deployments, Istio and KServe. Briefly, Istio is a service mesh that will supply an ingress controller, while KServe is a service that will create deployments on our K8s cluster serving our models through Istio. It is worth noting that you could use Ambassador as your ingress controller instead of Istio, but we found it buggy so use at your discretion!
Prerequisites:
- Previous Deployment Tut
- Install BentoML
- Install Minikube
By the end of this tutorial you will be able to:
- Setup Istio for ingress
- Setup KServe
- Run a Canary Deployment with KServe
This tutorial assumes you have done the previous tutorial on BentoML and Kubernetes. We will again be using minikube. If you wish to use a managed cloud cluster go for it, though you will need to pay for additional resources for Istio (it requires at least 4GB RAM)! You should have it already if you have done previous tuts, but you can download the data required for this tutorial from here. This is originally from a Kaggle dataset for Fraud Detection. Place this dataset in a data
directory in the root of your project.
Firstly, we need to setup Istio for ingress. Ingress is an API object which provides routing rules to manage both internal and external access to services in a Kubernetes cluster, acting as a single access source. We will utilize Istio as out ingress object to provide this functionality. In turn, KServe will use Istio to ensure that our traffic is rooted to the appropriate pods through the same endpoint. This process is pretty simple. Firstly, we need to start our minikube cluster. minikube will need extra resources for Istio, at least 4GB and 2 CPUs, so you may need to delete your old cluster first with minikube delete
. Alternatively, you could try alter the resources assigned to your current cluster. Here, we start our cluster with 8GB of RAM and 4 CPUs.
minikube start --memory 8192 --cpus 4
You may need to go through specific platform setup if you don't want to use minikube, so consult the relevant guide here. We provide 2 install methods, though you should stick with the first if you don't use helm. You must note that should you choose to use something other than minikube, you will need to setup a load balancer in your cluster, so that you can access the gateway that Istio creates. minikube provides minikube-tunnel
, which allows 127.0.0.1 access to the gateway.
The recommended way to setup Istio is to use the istioctl command line tool. We will use the following command to install for macOS, but obviously use the relevant install method for your platform. You can read the docs here.
brew install istioctl
Installation should now be as simple as running the install command.
istioctl install -y
If you are accustomed to using Helm, you can use the following commands to install Istio. Note, this is in alpha, so you may encounter issues.
Add the Istio repo and install the core charts.
# Add Istio Repo
helm repo add istio https://istio-release.storage.googleapis.com/charts
helm repo update
# Create Namespace
kubectl create namespace istio-system
# Install base chart
helm install istio-base istio/base -n istio-system
# Install discovery chart
helm install istiod istio/istiod -n istio-system --wait
Now, let's add the ingress controller.
# Install ingress chart
kubectl create namespace istio-ingress
kubectl label namespace istio-ingress istio-injection=enabled
helm install istio-ingress istio/gateway -n istio-ingress --wait
Finally, add a gateway!
kubectl apply -f - << END
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
name: kserve-gateway
namespace: istio-system
spec:
selector:
istio: ingressgateway # use istio default controller
servers:
- port:
number: 80
name: http
protocol: HTTP
hosts:
- "*"
END
KServe is a service that will create ML deployments on our K8s cluster and provide the ability to conduct canary deployments. Like BentoML, KServe is model and framework agnostic. While we use Bento service containers here, with KServe you can use raw model files and simply point to the URI where the model is located if that suits your use case better. Alternatively, there are a couple of other services we can use to manage our ML deployments. Check them out if you are interested.
- Cortex - Cortex is a tool that allows you to manage your ML deployments easily via CLI. It is AWS only, but it is a great choice if you are using the AWS EKS stack.
- Seldon Core - Very similar to KServe, Seldon Core is a Kubernetes-only service that allows you to manage your ML deployments. Seldon Core is slightly more feature rich, but is a more complex tool to setup and use. However, compatibility with BentoML 1.0 in its current state is limited, as the framework is built as the backbone for their proprietary platform Seldon Deploy.
cert-manager is a service that allows us to easily manage TLS certificates. It is a required dependency for KServe. Install it before proceeding.
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.9.1/cert-manager.yaml
We are using the serverless capability for KServe, which will allow our KServe services to both autoscale and run canaries, so we need to install the KNative runtime as a prerequisite. Install the CRDs and core component charts.
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.6.0/serving-crds.yaml
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.6.0/serving-core.yaml
There is a controller that KNative uses to integrate with Istio. Install the controller next.
kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.6.0/net-istio.yaml
Finally, install the component that integrates with cert-manager.
kubectl apply -f https://github.com/knative/net-certmanager/releases/download/knative-v1.6.0/release.yaml
Using kubectl
, let's install the core manifest.
kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.9.0/kserve.yaml
There are also default serving runtimes which are necessary for KServe to function.
kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.9.0/kserve-runtimes.yaml
For a sanity check, let's see if all the services are running.
kubectl get pods -A
OUTPUT:
NAMESPACE NAME READY STATUS RESTARTS AGE
cert-manager cert-manager-5dd59d9d9b-sjnpx 1/1 Running 0 11m
cert-manager cert-manager-cainjector-8696fc9f89-q2wmq 1/1 Running 0 11m
cert-manager cert-manager-webhook-7d4b5b8c56-vq765 1/1 Running 0 11m
istio-system istio-ingressgateway-6dbb44ff8d-r7nqk 1/1 Running 0 14m
istio-system istiod-6b5bb85ffb-lxxsk 1/1 Running 0 14m
knative-serving activator-c7d578d94-sj6pz 1/1 Running 0 11m
knative-serving autoscaler-6488988457-qnjq6 1/1 Running 0 11m
knative-serving controller-6cff4c9d57-vkrg2 1/1 Running 0 11m
knative-serving domain-mapping-7598c5f659-d7pj7 1/1 Running 0 11m
knative-serving domainmapping-webhook-8c4c9fdc4-6hz24 1/1 Running 0 11m
knative-serving net-certmanager-controller-56f7f6c7c8-rwjvm 1/1 Running 0 11m
knative-serving net-certmanager-webhook-845954c5bc-kwntl 1/1 Running 0 11m
knative-serving net-istio-controller-7d44658469-qs2b6 1/1 Running 0 11m
knative-serving net-istio-webhook-f5ccb98cf-wwbn7 1/1 Running 0 11m
knative-serving webhook-df8844f6-7g8v5 1/1 Running 0 11m
kserve kserve-controller-manager-866557d974-djjkm 2/2 Running 0 11m
kube-system coredns-6d4b75cb6d-kjwfr 1/1 Running 0 15m
kube-system etcd-minikube 1/1 Running 0 15m
kube-system kube-apiserver-minikube 1/1 Running 0 15m
kube-system kube-controller-manager-minikube 1/1 Running 0 15m
kube-system kube-proxy-w2hld 1/1 Running 0 15m
kube-system kube-scheduler-minikube 1/1 Running 0 15m
kube-system storage-provisioner 1/1 Running 0 15m
First install the libraries needed via the requirements.txt
. This is extremely important as you could run into issues later in the tutorial. Run the command below:
pip install -r requirements.txt
We're gonna need some images to deploy. We have supplied a training file train.py
that will train 2 models and save them to your BentoML store.
python train.py
There is also a bentofile.yaml
which we can use to build the two Bentos into a service. You will need to run bento build
twice, modifying the fraud_detection_service.py
file to use the correct model. You can do so by changing the model_type
variable in train.py
. Note the tags of each Bento when they have been built for each model.
#### ... in fraud_detection_service.py
model_type = "xgb"
# model_type = "rf" # Uncomment this
bentoml build
# Change model_type to rf after first build
bentoml build
Let's containerize them now using the correct Bento tags and tag them with their respective model names. Typically, before you containerize for minikube, you should point docker to minikube's docker daemon. However, we will be using dockerhub to host our images for this tutorial.
bentoml containerize fraud_classifier:<xgb-tag> -t fraud-classifier:xgb
bentoml containerize fraud_classifier:<rf-tag> -t fraud-classifier:rf
Currently, KServe is experiencing a bug with loading local images. For now, we will solely use Dockerhub as an alternative, so create a free account. To give you an idea of how to set it up, we will be using a private repo. If you are already using your one free private repo, feel free to use a public one instead.
Once you've created your account, navigate to the Create Repository page.
.
Name the repository
fraud-classifier
, and select the Private option (You can also select Public if you want).
We should create a separate namespace for our deployments. We've named it kserve-deployments
, but name it whatever you like.
kubectl create namespace kserve-deployments
Next, tag the images you created with your private repo, and push them to the registry. You may need to login first with docker login
.
docker tag fraud-classifier:xgb <dockerhub-username>/fraud-classifier:xgb
docker push <dockerhub-username>/fraud-classifier:xgb
docker tag fraud-classifier:rf <dockerhub-username>/fraud-classifier:rf
docker push <dockerhub-username>/fraud-classifier:rf
If you've selected Public, you can skip to the next section. Otherwise, you will need to create a imagePullSecrets
object on your cluster. This is a secret that contains the parameters necessary to authenticate with Dockerhub. Name this secret dockerhub-registry
and put it in the newly created namespace.
kubectl create secret docker-registry dockerhub-registry \
--docker-server=https://index.docker.io/v1/ \
--docker-email=<dockerhub-email> \
--docker-username=<dockerhub-user> \
--docker-password=<dockerhub-password> \
-n kserve-deployments
Check that your secret was created correctly.
kubectl get secret dockerhub-registry --output="jsonpath={.data.\.dockerconfigjson}" -n kserve-deployments | base64 --decode
There are now two options for deploying our images in the private repo. The first is to add the imagePullSecrets
object to the deployment under the spec. The second is to patch the a service account to always the imagePullSecrets
object to any manifest deployed in the kserve-deployments namespace. We are going to do the latter and patch the default service account.
kubectl patch serviceaccount default -p "{\"imagePullSecrets\": [{\"name\": \"dockerhub-registry\"}]}" -n kserve-deployments
Before we deploy a KServe manifest, let's go through the benefits of using KServe. Apart from allowing us to conduct Canary deployments, KServe gives us a number of tools to use to manage our deployments. Whether you need such functionality will largely depend on how mature your organization is and how much scalability you require with regards to machine learning services. We do recommended running Canaries at all levels of scale apart from your initial rollout, which is our main motivation behind this tut, but here are some of the other things KServe offers:
- ModelMesh - In cases where you frequently need to change which model to use for a given situation, ModelMesh is a great tool to use. The system will switch between models automatically without having to redeploy, ensuring you will use the best model for the current available computation to maximize responsiveness to users.
- Pre/Post Processing Inference Graph - KServe allows you to specify an Inference Graph to build inference pipelines. Within the graph, you can define pre and post processing steps, traffic splits, model ensembles and model switching based on defined conditions. You can read more about how this works here.
- Model Monitoring & Explainability - KServe has built-in integration with both Alibi Detect and Alibi Explain. This enables outlier and drift detection easily, as well as a black-box model for explainability. We will cover these specific services in a future version of this tutorial.
To initially deploy, we're going to create a special kind of Kubernetes deployment called a InferenceService. This specific resource allows us to add a predictor
field to our manifest. This field is a spec used to specify the ML model to use for a given request and how much traffic should be routed to the service.
Create a file called deployment_xgb.yaml
and add the following to it:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
labels:
app: kserve
name: fraud-detection
namespace: kserve-deployments
...
These are usual fields that you fill out for a Kubernetes deployment. Note the kind of deployment is InferenceService
.
Now let's add the spec block. This is where we specify the ML models to use for a given request. This is done under the predictor
field. Within this spec, we specify the docker image to use under containers
. KServe allows you to serve specific model files too. For example, instead of containers
we could instead specify a sklearn
field to serve a pickled sklearn model instead (You can see an example here).
... # in deployment_xgb.yaml
spec:
predictor:
containers:
- image: <dockerhub-username>/fraud-classifier:xgb
imagePullPolicy: IfNotPresent
name: xgb
env:
- name: VERSION
value: "XGBoost"
ports:
- containerPort: 3000
securityContext:
runAsUser: 1034
We can deploy now deploy our manifest with kubectl
.
kubectl apply -f deployment_xgb.yaml -n kserve-deployments
This should take about a minute to fully deploy. You can check that status of the InferenceService by running:
kubectl get isvc fraud-classifier -n kserve-deployments
If the status is still Unknown
, you can check if the KNative revision was created correctly by running:
kubectl get revision $(kubectl get configuration fraud-classifier-predictor-default --output jsonpath="{.status.latestCreatedRevisionName}" -n kserve-deployments) -n kserve-deployments
Great, we have created our first deployment with KServe! We'll need to make a request to ensure it's working correctly. Since it will create an active process which remains open, run the following command in a separate terminal to assign an external IP to the Istio gateway.
minikube tunnel
Check that the istio-ingressgateway
service has been assigned an external IP address of 127.0.0.1
.
kubectl get svc -n istio-system
You can make a request similarly to the previous tut with a POST to the /fraud-classifier
method, except now we make it the Istio gateway on 127.0.0.1
! Let's do a simple request with curl
. Note the included header for the service hostname. This is necessary for the request to be routed to the InferenceService by KServe.
export SERVICE_HOSTNAME=$(kubectl get inferenceservice fraud-classifier -n kserve-deployments -o jsonpath='{.status.url}' | cut -d "/" -f 3)
curl -v http://127.0.0.1/fraud-classifier -H "Host: $SERVICE_HOSTNAME" -d '[{"isFraud":0, "TransactionAmt":495.0,"ProductCD":"W","card4":"visa","P_emaildomain":"live.com","R_emaildomain":null,"M1":"T","M2":"T","M3":"T"}]'
If all went well, the response body should output [1]
.
Canary deployments are a way to progressively rollout your new models to a subset of users, monitoring the performance of your newly deployed model versus the original baseline. They work by deploying two or more versions of your model, routing some traffic to the baseline and the rest to the other. As you become more confident in the new model, you can increase traffic to the new service and eventually switch to it entirely. This is a great way to ensure the new model is performing as expected, while also monitoring the service for any potential errors.
Let's add a canary deployment. Duplicate your deployment_xgb.yaml
file and rename it deployment_canary.yaml
. In the new file, rename the name
field to rf
and change the image pulled to fraud-classifier:rf
, as such. You'll notice we have specified a new field in our predictor for canaryTrafficPercent
. This field is used to specify the percentage of traffic that should be routed to the new canary service.
... # in deployment_canary.yaml
spec:
predictor:
canaryTrafficPercent: 20
containers:
- image: <dockerhub-username>/fraud-classifier:rf
imagePullPolicy: IfNotPresent
name: rf
env:
- name: VERSION
value: "RandomForest"
ports:
- containerPort: 3000
securityContext:
runAsUser: 1034
Adding the canary deployment is now just matter of applying the new deployment file!
kubectl apply -f deployment_canary.yaml -n kserve-deployments
Let's check the canary deployed correctly. First, we'll check the InferenceService status.
kubectl get isvc fraud-classifier -n kserve-deployments
You'll notice LATESTREADYREVISION
has now updated. Furthermore, you'll also notice now that there are numbers under PREV
and LATEST
fields. This indicates the amount of traffic in percent going to the previous and latest revision respectively. PREV
should be set to 80% and LATEST
should be set to 20%.
For a further check, grab all the pods in the kserve-deployments
namespace. You'll notice that there are now two pods for fraud-classifier, one for each predictor revision.
kubectl get pods -n kserve-deployments
From the above command, make sure that the second container is in a Running state. There may be a problem with the image otherwise. To demonstrate how the canary works we have supplied a file, test_requests.py
, that will make 100 requests to the deployed service. We have chosen an example such that the XGBoost model will output [1]
and the Random Forest model will output [0]
. Run the file.
python test_requests.py
OUTPUT:
78% to the XGB Model, 22% to the RF Model
You'll notice that roughly 80% of the requests were routed to the XGBoost model and 20% to the Random Forest model. Let's update our canary deployment to increase the traffic to the new model. Simply change the canaryTrafficPercent
field to 50
.
... # in deployment_canary.yaml
spec:
predictor:
canaryTrafficPercent: 50
...
Deploying is as easy as applying the manifest again. You can confirm the new splits by running the previous kubectl get
commands.
kubectl apply -f deployment_canary.yaml -n kserve-deployments
Wait a minute for the canary to be deployed, then rerun test_requests.py
. The split should be roughly 50/50 now!
OUTPUT:
51% to the XGB Model, 49% to the RF Model
In production, if you wish to promote the new model fully, just remove canaryTrafficPercent
from the predictor manifest and reapply once again. Let's do that now.
... # in deployment_canary.yaml
spec:
predictor:
containers:
- image: <dockerhub-username>/fraud-classifier:rf
imagePullPolicy: IfNotPresent
name: rf
env:
- name: VERSION
value: "RandomForest"
ports:
- containerPort: 3000
securityContext:
runAsUser: 1034
Apply the manifest again.
kubectl apply -f deployment_canary.yaml -n kserve-deployments
If you run the previous kubectl get isvc
command now, you'll see that there is now only single revision, fraud-classifier-predictor-default-00002
. Run the test_requests.py
file one last time, for a sanity check.
OUTPUT:
0% to the XGB Model, 100% to the RF Model
Congratulations! You're now able to use canary deployments to promote your new models!