Skip to content

Project 3 Part 2

milanchheta edited this page May 4, 2020 · 28 revisions

Table Contents

Overview

Introduction

Features provided by service mesh proves to be useful in maintaining and managing the services deployed on cloud using CI/CD pipeline. Exploring these features by integrating service mesh in a system helps to implement and understand the needs of system in a better way and improve its reliability and efficiency. Out of many available service mesh providers, Istio proves to be filled with many features that can be useful for a microservice based cloud architecture. Though other service mesh providers may be easier to integrate in a system, but features offered by istio seems more fascinating.

Final Problem Statement

  • Aim of this project milestone includes understanding the needs of our current weather forecasting system, discovering the istio offered features by integrating them, determine its usefulness for our current system, and ultimately understand how it can be useful for Apache Airavata's MFT

  • Our weather forecasting system showed various shortcomings and vulnerabilities. Some of the shortcomings of our system were:-

    • No traffic routing management,to ensure reliability, efficiency and to avoid requests going to poorly behaving services.
    • No proper logging of requests or other no other metrics are available
    • No proper security methods are implemented. Authorization and authentication of services and requests us needed.
    • Current deployment strategy introduces a little downtime which should not happen in production environment.

    Thus, integrating istio could be useful in our system as it solves the above mentioned problems and will help to improve the cloud deployed system, and allow us to understand the perks of the features offered and its implementation methods.

Problem Statement Development

  • Research was done by teammates for airavata-custos, airavata-mft and service mesh.

    • Airavata Custos Repository reference

      https://github.com/apache/airavata-custos/tree/develop/custos-core-services
      
    • Airavata MFT Repository reference

      https://github.com/apache/airavata-mft
      
    • Service Mesh references

      https://platform9.com/blog/kubernetes-service-mesh-a-comparison-of-istio-linkerd-and-consul/
      
      https://logz.io/blog/what-is-a-service-mesh-kubernetes-istio/
      
      https://istio.io/docs/concepts/what-is-istio/
      
  • Upon understanding the Airavata MFT's cloud architecture and intensive research on service mesh, decision was made to integrate service mesh in our current weather forecasting system, to improve our system and use our system as a way to understand the istio's offered features, their implementations and try to justify service mesh's use and understand how it can be useful for Apache Airavata MFT.

Methodology

  • Investigating problems in our weather forecasting system included:
    • Testing our services using Jmeter predefined tests. Several, services failed for reasons that were not very clear. Integrating services with side car proxis could help manage the traffic routing in a better way with the help of istio's service mesh configuration.
    • Deploying services using CI/CD pipeline. While automated deployment was implemented, it did introduce downtime in our system. Thus, better deployment strategy was to be achieved to avoid the same. Canary releases i.e. progragressive traffic shifting is one of such deployment strategy that can allow deployments of services without downtime.
    • Making requests from unauthorized service from a different namespace in kubernetes cluster. This showed that not every requests was authorized or authenticated and portrayed the need of security. Applying mTLS encryption for requests can help solve the same.
    • No tools or services were configured to visualize the services in the cluster to understand the architecture and traffic routing. Observability offered by service mesh and other tools proved to be useful for this problem. Kiali, prometheus and grafana were some of the tools that could help achieve the same.

Implementation

Phase 1 - Integrating control plane, data plane and configure sidecar proxies for running services.

  • Istio was downloaded( curl -L https://istio.io/downloadIstio | sh -) and configured in istio-system namespace of kubernetes cluster. Data plane, control plane and its required components were integrated in the cluster using isito-init.yaml file by running kubectl apply -f istio-init.yaml. This will configure and initialize all the istio required services in istio system, and setup up both control and data plane.
  • Service mesh was introduced and sidecar proxies were injected by setting the auto-injection of istio in default namespace(kubectl label namespace default istio-injection=enabled).
  • Istio configures side car proxies in such a way that it collects metrics automatically.

Phase 2 - Implementing canary deployment strategy in our current system

  • Before implementing canary deployment, blue green deployment was implemented where shell script was used detect whether the service is ready or not and as soon as the service is ready the traffic is routed to the new updated service. Shell script used was :

          DEPLOYMENTNAME=$1-$2
          SERVICE=$1
          VERSION=$2
          DEPLOYMENTFILE=\$3
    
       kubectl apply -f $DEPLOYMENTFILE
    
       # Wait until the Deployment is ready by checking the MinimumReplicasAvailable condition.
       READY=$(kubectl get deploy $DEPLOYMENTNAME -o json | jq '.status.conditions[] | select(.reason == "MinimumReplicasAvailable") | .status' | tr -d '"')
       while [[ "$READY" != "True" ]]; do
           READY=$(kubectl get deploy $DEPLOYMENTNAME -o json | jq '.status.conditions[] | select(.reason == "MinimumReplicasAvailable") | .status' | tr -d '"')
           sleep 5
       done
    
       # Update the service selector with the new version
       kubectl patch svc $SERVICE -p "{\"spec\":{\"selector\": {\"name\": \"${SERVICE}\", \"version\": \"${VERSION}\"}}}"
    
       echo "Done."
    
  • But, against blue-green deployment, canary releases proves to be better, as it offers progressive traffic shifting instead of direct traffic shifting, allowing to analyse the service and decide whether to make the new deployment as the stable release version of the service based on its performance.

  • Flagger is a open source progressive delivery operator for kubernetes and it was used to automate the process of gradually shifting the traffic to new deployment and to ultimately switch the traffic to that service if the deployment is healthy. Flagger also allows us to run analysis checks before shifting the traffic to new release.

  • Links to deployment files related for specific branches are given below:

  • Flagger was install on kubernetes cluster on istio-system namespace by running kubectl apply -k github.com/weaveworks/flagger//kustomize/istio

  • Every service has 3 parts of deployment, one is main deployment pod where the container and a pod will be created using the image from docker hub, next is horizontal pod auto scaler to handle the auto scaling of service replicas based on the traffic, number of requests and service load. Lastly, flagger introduces a new kind of deployment i.e Canary. Canary deployment of a particular service will detect its corresponding horizontal pod auto scaler and container deployment, create a virtual service for that deployment pod, set up ingress gateway access if needed/specified.

  • For each deployment, [deploymentName]-primary will be created and will always hold a stable release of a service. [deploymentName]-canary will be created once a update is detected for current deployment of any running service, and traffic will be gradually shifted from [deploymentName]-primary to [deploymentName]-canary, after the new release is ready and running without errors. Weights are used to gradually shift traffic between two primary and canary versions, and once the weight has been scaled up to mentioned limit for canary and if no error was detected, weight from primary release will scale down completely, and canary release will be made primary, as primary always holds the stable release of a service.

Phase 3 - Ingress/Egress gateways and security

  • Ingress gateway was setup using the istio-gateway.yaml file by running kubectl apply -f istio-gateway.yaml. Ingress gateway acts as a point of entry to access the application and thus avoids direct access to services. Gateway access url and port is stored in variable using:

      export INGRESS_PORT=\$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
    
      export SECURE_INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="https")].nodePort}')
    
      export INGRESS_HOST=149.165.170.101
    
      export GATEWAY_URL=$INGRESS_HOST:$INGRESS_PORT
    
  • Now echo http://$GATEWAY_URL/ will give the ip and port to access the system interface at.

  • Service entries were configured to setup egress gateways and to setup a gateway for services accessing external services. Service Entry was setup using the service-entry.yaml file by running kubectl apply -f service-entry.yaml.

  • mTLS peer authentication configuration was implemented to encrypt the requests between the services and to inf=troduce security feature in our system. This was done by enabling the mTLS encryption for entire mesh. peer-authentication.yaml file was used for the same.

Phase 4 - Observability

  • Kiali was setup to visualize the services in the system and its related applications, pods, health, etc. Kiali secret is setup using following commands in istio-system namespace

       KIALI_PASSPHRASE=$(read -sp 'Kiali Passphrase: ' pval && echo -n $pval | base64)
    
       KIALI_USERNAME=$(read -p 'Kiali Username: ' uval && echo -n $uval | base64)
    
       NAMESPACE=istio-system
    
       cat <<EOF | kubectl apply -f -
       apiVersion: v1
       kind: Secret
       metadata:
       name: kiali
       namespace: $NAMESPACE
       labels:
           app: kiali
       type: Opaque
       data:
       username: $KIALI_USERNAME
       passphrase: $KIALI_PASSPHRASE
       EOF
    

    To access kiali, its type was changed from LoadBalancer to NodePort, and kubectl get svc kiali -n istio-system is used to display the port to access kiali at.

  • Grafana was setup to view the metrics related to services and service mesh. To access grafana, its type was changed from LoadBalancer to NodePort, and kubectl get svc grafana -n istio-system is used to display the port to access grafana at.

  • Prometheus was setup to view the logs related to services and service mesh. To access Prometheus, its type was changed from LoadBalancer to NodePort, and kubectl get svc prometheus -n istio-system is used to display the port to access Prometheus at.

Evaluation

Phase 1

  • To check installation of istio data plane and control plane services, you can run the below command in master node:

      kubectl get svc -n istio-system
    

    All the istio services will be displayed, verifying successful installation of istio on the cluster.

  • To verify installation of sidecar proxy for a pod, perform the below operations on the master node:

    Get deployment names by running 'kubectl get deployment'

    Get details of a particular deployment by running 'kubectl describe po -l app=[DEPLOYMENT NAME]', where DEPLOYMENT NAME is name of any deployment with '-primary' at the end of the name.

    eg. ' kubectl describe po -l app=frontend-primary' A field with 'istio-proxy:' will contain the proxy container details.

      istio-proxy:
          Container ID:  docker://c86fb91670916af05d67051b895b5f19b4f988d3b7e88923241e85dc7de9e6ce
          Image:         docker.io/istio/proxyv2:1.5.2
          Image ID:      docker-pullable://istio/proxyv2@sha256:b569b3226624b63ca007f3252162390545643433770c79a9aadb1406687d767a
          Port:          15090/TCP
          Host Port:     0/TCP
    

    This will verify the side car injection for the service.

Phase 2

  • To test the working of canary deployment, commit can be made to one of the following branches:

  • On commiting to one of these branches, the CI/CD pipeline will update the new image in the cluster, after the build process is completed on Travis Ci.

  • You can check on the master node, whether the new deployment was detected or not by running the following command:

    run 'kubectl describe canary/{CANARY NAME}' CANARY NAME can be one of the names that you get on running 'kubectl get canary'.

  • It might take time for the canary to detect the new version, as it waits for the new pod to be in ready and running state before it can start shifting the traffic.

  • Once, the new version is detected, you can run 'watch kubectl get canary' to watch the porgress of the traffic shifting and changes in the weight.

  • Once, completed, the above command will show succeeded for the service for which new version was committed.

  • It will take around 5 minutes to completely shift to a new new version. You can run the 'kubectl describe canary/{CANARY NAME}' command again to get the logs.

  • Thus, successful canary deployment strategy will be verified without downtime.

  • Kiali can also be used to observe the above process. Please see phase 4 evaluation for the same.

Phase 3

  • Run the below commands on the master node to get the ip/port combination of ingress gateway that is used to acces the application.

      export INGRESS_PORT=\$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="http2")].nodePort}')
    
      export SECURE_INGRESS_PORT=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?(@.name=="https")].nodePort}')
    
      export INGRESS_HOST=149.165.170.101
    
      export GATEWAY_URL=$INGRESS_HOST:$INGRESS_PORT
    

    'echo http://$GATEWAY_URL/' will give the ingress gateway ip/port combination. 'kubectl get gateway' will give you the running gateway service. 'kubectl get ServiceEntry' will give you the running Service Entry. This, will verifies that system is accessed via ingress gateway and outside services are accessed via service entry.

  • Kiali can also be used to verify the ingress gateway, service entry and mTLS encryption between services. Check Phase 4 for the same.

Phase 4

  • Access kiali at http://149.165.170.101:32537/ (username/password: admin)

  • Kiali can be used to visualize the system architecture, configuration and traffic flow in very effective way

  • For viewing the metrics, kiali and logs generate load by visiting the gateway access url http://149.165.170.101:32045/.

    • Kiali system graph

      • whole system graph on kiali
    • Kiali ingress gateway

      • istio ingress gateway can be seen as entry point for the system
    • kiali service entry

      • data-modelling and data retrieval services makes requests to external service and thus they are connected to svc-entry, a SeRvice Entry.
    • kiali mTLS and traffic flow

      • The traffic flow can be seen and the lock icon between traffic flow indicates mTLS encryption.
    • kiali canary and weight distribution

      • deployment update detected
      • traffic progressively switching
      • traffic switched and update completed successfully
      • kiali graph when canary release is detected and traffic is being shifted using weights
  • Access grafana at http://149.165.170.101:31433/ Grafana can be used to help view the different metrics graphically. It has various dashboards available to view from. You can navigate through them, from top left corner. For Eg.

    If you view Istio Workload Dashboard, it gives you information about the workload requests and operation.

    Istio performance dashboard gives you the information related to cPU usage.

  • Access prometheus at http://149.165.170.101:30743/

    Prometheus has a lot of queries that can help you view the logs and metrics. When you click on the text box it automatically suggests all the queries that are available. After selecting a particular query, execute the query to load up the logs. Two options are available to view, the graph version and the console version

    Eg. Queries-

    To Find out the CPU usage for every pod run the below query

    • rate(container_cpu_user_seconds_total{pod_name!="",namespace="default"}[10m])

    To find out total number of http requests made to the kubernetes nodes run the below query

    • http_requests_total

Conclusions and Outcomes

  • Implementing istio's service mesh, successfully allowed us to manage the traffic and implement several security rules.
  • Observability of the system helps to visualize the system in a more effective way.
  • Canary deployment seems to be more usefuls as it allows updating service to a new version without any down time.
  • Flagger app proves to be a useful open source tool to be used to automate the canary deployments, and it can also be prove useful to integrate the same in Apache Airavata MFT.
  • From the results observed from implementing istio's service mesh, we can say that, integrating istio offered features in Apache Airavata MFT could help improve the routing of traffic, increase reliability and efficiency, avoid backlogs of request, help improve the security features such as authenticating and authorizing services and its requests, and more than that istio's offered feature of observability using kiali/grafana/prometheus could help to actually visualize the cloud architecture, routing of requests, health of services, logs and several other metrics.

Team Member Contributions

  • Milan Chheta

    Research

    • #172 Research on istio and service mesh
    • #180 Research on type of deployments
    • #181 Research on security aspects with istio
    • #182 Research on improved observability with kiali/grafana/prometheus

    Development and Architecture design

    • #174 Created gateway yaml file for istio
    • #176 Updating kubernetes yaml files for istio functioning
    • #177 Integrating istio on production deployment
    • #178 Setup Kiali for istio service mesh
    • #179 Setup Grafana and Prometheus For istio Service mesh
    • #184 Integrate flagger with istio for automating canary deployments
    • #186 #187 Setup Canary yaml files for frontend service
    • #188 #189 Setup Canary yaml files for session management service
    • #190 #191 Setup Canary yaml files for user management service
    • #192 #193 Setup Canary yaml files for api gateway service
    • #194 #195 Setup Canary yaml files for data modelling service
    • #196 #197 Setup Canary yaml files for data analysis service
    • #198 #199 Setup Canary yaml files for data retrieval service
    • #200 Changed travis.yaml to Update the CI/CD pipeline
    • #214 Created shell scripts and istio Yaml files

    Testing

    • #173 Tested istio integration
    • #175 Tested Kafka with istio integration
    • #185 Tested canary automated deployments
    • #183 Test canary deployment on VM

    Documentation

    • #213 Wiki Page entry for project 3 part 2
  • Disha Talreja

    Research

    • #172 Research on istio and service mesh
    • #201 Kafka on Istio

    Development and Architecture Design

    • #202 Istio Authentication
    • #203 Authentication policies in Deployment files
    • #204 Setup for authentication and mutual TLS
    • #205 Update yaml files
    • #212 Policy precedence in authentication

    Testing

    • #202 Istio Authentication
    • #215 Authentication part 1 cleaning
  • Neha Nayak

    Research

    • #172 Research on istio and service mesh
    • #180 Research on type of deployments
    • #181 Research on security aspects with istio
    • #182 Research on improved observability with kiali/grafana/prometheus
    • #206Canary Deployment

    Development and Architecture Design

    • #207 Ingress and egress filtering
    • #208 mTLS authentication
    • #209 Kubernetes network policies
    • #210 Configuring istio egress gateway
    • #211 Updated yaml file

    Testing

    • #208 mTLS authentication