Enable gRPC inferencing by exposing route #57

heyselbi · 2023-07-10T14:52:54Z

We would like to enable gRPC inferencing in modelmesh-serving by exposing a route.

Relevant docs:
[1] Exposing route: https://github.com/kserve/modelmesh-serving/tree/main/docs/configuration#exposing-an-external-endpoint-using-an-openshift-route
[2] Self-signed TLS: https://github.com/kserve/modelmesh-serving/blob/main/docs/configuration/tls.md#generating-tls-certificates-for-devtest-using-openssl

Route created is passthrough. Re-encrypt could be an option too but I haven't been successful in getting the grpcurl tests successful with it yet.

heyselbi · 2023-07-11T18:26:21Z

PR for creation of grpc passthrough route: https://github.com/opendatahub-io/odh-model-controller/pull/35/files

heyselbi · 2023-07-11T18:39:47Z

We are currently blocked on this issue. I have followed all the instructions shown in [1] and [2]. While grpcurl works and inferences as expected, one of the two rest-proxy containers is showing connection issues.

Cluster details:
OpenShift 4.13.0
Open Data Hub 1.7.0
Modelmesh version: v0.11.0-alpha (ref)
Controller namespace: opendatahub
User/isvc namespace: modelmesh-serving

Custom ConfigMap:

kind: ConfigMap
apiVersion: v1
metadata:
  name: model-serving-config
  namespace: opendatahub
data:
  config.yaml: |
    tls:
      secretName: mm-new

Secret mm-new

kind: Secret
apiVersion: v1
metadata:
  name: mm-new
  namespace: opendatahub
data:
  tls.crt: <hidden>
  tls.key: <hidden>
type: kubernetes.io/tls

rest-proxy container with failing logs:

{"level":"info","ts":"2023-07-10T15:31:14Z","msg":"Starting REST Proxy..."}
{"level":"info","ts":"2023-07-10T15:31:14Z","msg":"Using TLS"}
{"level":"info","ts":"2023-07-10T15:31:14Z","msg":"Registering gRPC Inference Service Handler","Host":"localhost","Port":8033,"MaxCallRecvMsgSize":16777216}
{"level":"info","ts":"2023-07-10T15:31:19Z","msg":"Listening on port 8008 with TLS"}
2023/07/10 15:31:23 http: TLS handshake error from <IP1>:50510: read tcp <IP3>:8008-><IP1>:50510: read: connection reset by peer
2023/07/10 15:31:23 http: TLS handshake error from <IP2>:47526: read tcp <IP3>:8008-><IP2>:47526: read: connection reset by peer
2023/07/10 15:31:28 http: TLS handshake error from <IP1>:50518: read tcp <IP3>:8008-><IP1>:50518: read: connection reset by peer

error keeps repeating. The second rest-proxy container isn't showing failing logs:

{"level":"info","ts":"2023-07-10T13:20:00Z","msg":"Starting REST Proxy..."}
{"level":"info","ts":"2023-07-10T13:20:00Z","msg":"Using TLS"}
{"level":"info","ts":"2023-07-10T13:20:00Z","msg":"Registering gRPC Inference Service Handler","Host":"localhost","Port":8033,"MaxCallRecvMsgSize":16777216}
{"level":"info","ts":"2023-07-10T13:20:05Z","msg":"Listening on port 8008 with TLS"}

Deployment yaml:

apiVersion: apps/v1
metadata:
  annotations:
    deployment.kubernetes.io/revision: '2'
  namespace: modelmesh-serving
  labels:
    app.kubernetes.io/instance: modelmesh-controller
    app.kubernetes.io/managed-by: modelmesh-controller
    app.kubernetes.io/name: modelmesh-controller
    modelmesh-service: modelmesh-serving
    name: modelmesh-serving-ovms-1.x
spec:
  replicas: 2
  selector:
    matchLabels:
      modelmesh-service: modelmesh-serving
      name: modelmesh-serving-ovms-1.x
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/instance: modelmesh-controller
        app.kubernetes.io/managed-by: modelmesh-controller
        app.kubernetes.io/name: modelmesh-controller
        modelmesh-service: modelmesh-serving
        name: modelmesh-serving-ovms-1.x
      annotations:
        prometheus.io/path: /metrics
        prometheus.io/port: '2112'
        prometheus.io/scheme: https
        prometheus.io/scrape: 'true'
    spec:
      restartPolicy: Always
      serviceAccountName: modelmesh-serving-sa
      schedulerName: default-scheduler
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/arch
                    operator: In
                    values:
                      - amd64
      terminationGracePeriodSeconds: 90
      securityContext: {}
      containers:
        - resources:
            limits:
              cpu: '1'
              memory: 512Mi
            requests:
              cpu: 50m
              memory: 96Mi
          terminationMessagePath: /dev/termination-log
          name: rest-proxy
          env:
            - name: REST_PROXY_LISTEN_PORT
              value: '8008'
            - name: REST_PROXY_GRPC_PORT
              value: '8033'
            - name: REST_PROXY_USE_TLS
              value: 'true'
            - name: REST_PROXY_GRPC_MAX_MSG_SIZE_BYTES
              value: '16777216'
            - name: MM_TLS_KEY_CERT_PATH
              value: /opt/kserve/mmesh/tls/tls.crt
            - name: MM_TLS_PRIVATE_KEY_PATH
              value: /opt/kserve/mmesh/tls/tls.key
          ports:
            - name: http
              containerPort: 8008
              protocol: TCP
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: tls-certs
              readOnly: true
              mountPath: /opt/kserve/mmesh/tls
          terminationMessagePolicy: File
          image: 'quay.io/opendatahub/rest-proxy:v0.10.0'
        - resources:
            limits:
              cpu: 100m
              memory: 256Mi
            requests:
              cpu: 100m
              memory: 256Mi
          readinessProbe:
            httpGet:
              path: /oauth/healthz
              port: 8443
              scheme: HTTPS
            initialDelaySeconds: 5
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          name: oauth-proxy
          livenessProbe:
            httpGet:
              path: /oauth/healthz
              port: 8443
              scheme: HTTPS
            initialDelaySeconds: 30
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          ports:
            - name: https
              containerPort: 8443
              protocol: TCP
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: proxy-tls
              mountPath: /etc/tls/private
          terminationMessagePolicy: File
          image: >-
            registry.redhat.io/openshift4/ose-oauth-proxy@sha256:4bef31eb993feb6f1096b51b4876c65a6fb1f4401fee97fa4f4542b6b7c9bc46
          args:
            - '--https-address=:8443'
            - '--provider=openshift'
            - '--openshift-service-account="modelmesh-serving-sa"'
            - '--upstream=http://localhost:8008'
            - '--tls-cert=/etc/tls/private/tls.crt'
            - '--tls-key=/etc/tls/private/tls.key'
            - '--cookie-secret=SECRET'
            - >-
              --openshift-delegate-urls={"/": {"namespace": "modelmesh-serving",
              "resource": "services", "verb": "get"}}
            - >-
              --openshift-sar={"namespace": "modelmesh-serving", "resource":
              "services", "verb": "get"}
            - '--skip-auth-regex=''(^/metrics|^/apis/v1beta1/healthz)'''
        - resources:
            limits:
              cpu: '5'
              memory: 1Gi
            requests:
              cpu: 500m
              memory: 1Gi
          terminationMessagePath: /dev/termination-log
          lifecycle:
            preStop:
              httpGet:
                path: /prestop
                port: 8090
                scheme: HTTP
          name: ovms
          securityContext:
            capabilities:
              drop:
                - ALL
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: models-dir
              mountPath: /models
          terminationMessagePolicy: File
          image: 'quay.io/opendatahub/openvino_model_server:2022.3-release'
          args:
            - '--port=8001'
            - '--rest_port=8888'
            - '--config_path=/models/model_config_list.json'
            - '--file_system_poll_wait_seconds=0'
            - '--grpc_bind_address=127.0.0.1'
            - '--rest_bind_address=127.0.0.1'
        - resources:
            limits:
              cpu: '2'
              memory: 512Mi
            requests:
              cpu: 50m
              memory: 96Mi
          terminationMessagePath: /dev/termination-log
          lifecycle:
            preStop:
              httpGet:
                path: /prestop
                port: 8090
                scheme: HTTP
          name: ovms-adapter
          command:
            - /opt/app/ovms-adapter
          env:
            - name: ADAPTER_PORT
              value: '8085'
            - name: RUNTIME_PORT
              value: '8888'
            - name: RUNTIME_DATA_ENDPOINT
              value: 'port:8001'
            - name: CONTAINER_MEM_REQ_BYTES
              valueFrom:
                resourceFieldRef:
                  containerName: ovms
                  resource: requests.memory
                  divisor: '0'
            - name: MEM_BUFFER_BYTES
              value: '134217728'
            - name: LOADTIME_TIMEOUT
              value: '90000'
            - name: USE_EMBEDDED_PULLER
              value: 'true'
            - name: RUNTIME_VERSION
              value: 2022.3-release
          securityContext:
            capabilities:
              drop:
                - ALL
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: models-dir
              mountPath: /models
            - name: storage-config
              readOnly: true
              mountPath: /storage-config
          terminationMessagePolicy: File
          image: 'quay.io/opendatahub/modelmesh-runtime-adapter:v0.11.0-alpha'
        - resources:
            limits:
              cpu: '3'
              memory: 448Mi
            requests:
              cpu: 300m
              memory: 448Mi
          readinessProbe:
            httpGet:
              path: /ready
              port: 8089
              scheme: HTTP
            initialDelaySeconds: 5
            timeoutSeconds: 1
            periodSeconds: 5
            successThreshold: 1
            failureThreshold: 3
          terminationMessagePath: /dev/termination-log
          lifecycle:
            preStop:
              exec:
                command:
                  - /opt/kserve/mmesh/stop.sh
                  - wait
          name: mm
          livenessProbe:
            httpGet:
              path: /live
              port: 8089
              scheme: HTTP
            initialDelaySeconds: 90
            timeoutSeconds: 5
            periodSeconds: 30
            successThreshold: 1
            failureThreshold: 2
          env:
            - name: MM_SERVICE_NAME
              value: modelmesh-serving
            - name: MM_SVC_GRPC_PORT
              value: '8033'
            - name: WKUBE_POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: WKUBE_POD_IPADDR
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
            - name: MM_LOCATION
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.hostIP
            - name: KV_STORE
              value: 'etcd:/opt/kserve/mmesh/etcd/etcd_connection'
            - name: MM_METRICS
              value: 'prometheus:port=2112;scheme=https'
            - name: SHUTDOWN_TIMEOUT_MS
              value: '90000'
            - name: INTERNAL_SERVING_GRPC_PORT
              value: '8001'
            - name: INTERNAL_GRPC_PORT
              value: '8085'
            - name: MM_SVC_GRPC_MAX_MSG_SIZE
              value: '16777216'
            - name: MM_KVSTORE_PREFIX
              value: mm
            - name: MM_DEFAULT_VMODEL_OWNER
              value: ksp
            - name: MM_LABELS
              value: 'mt:openvino_ir,mt:openvino_ir:opset1,pv:grpc-v1,rt:ovms-1.x'
            - name: MM_TYPE_CONSTRAINTS_PATH
              value: /etc/watson/mmesh/config/type_constraints
            - name: MM_DATAPLANE_CONFIG_PATH
              value: /etc/watson/mmesh/config/dataplane_api_config
            - name: MM_TLS_KEY_CERT_PATH
              value: /opt/kserve/mmesh/tls/tls.crt
            - name: MM_TLS_PRIVATE_KEY_PATH
              value: /opt/kserve/mmesh/tls/tls.key
          securityContext:
            capabilities:
              drop:
                - ALL
          ports:
            - name: grpc
              containerPort: 8033
              protocol: TCP
            - name: prometheus
              containerPort: 2112
              protocol: TCP
          imagePullPolicy: IfNotPresent
          volumeMounts:
            - name: tc-config
              mountPath: /etc/watson/mmesh/config
            - name: etcd-config
              readOnly: true
              mountPath: /opt/kserve/mmesh/etcd
            - name: tls-certs
              readOnly: true
              mountPath: /opt/kserve/mmesh/tls
          terminationMessagePolicy: File
          image: 'quay.io/opendatahub/modelmesh:v0.11.0-alpha'
      serviceAccount: modelmesh-serving-sa
      volumes:
        - name: proxy-tls
          secret:
            secretName: model-serving-proxy-tls
            defaultMode: 420
        - name: models-dir
          emptyDir:
            sizeLimit: 1536Mi
        - name: storage-config
          secret:
            secretName: storage-config
            defaultMode: 420
        - name: tc-config
          configMap:
            name: tc-config
            defaultMode: 420
        - name: etcd-config
          secret:
            secretName: model-serving-etcd
            defaultMode: 420
        - name: tls-certs
          secret:
            secretName: mm-new
            defaultMode: 420
      dnsPolicy: ClusterFirst
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 15%
      maxSurge: 75%
  revisionHistoryLimit: 10
  progressDeadlineSeconds: 600
status:
  observedGeneration: 6
  replicas: 2
  updatedReplicas: 2
  readyReplicas: 2
  availableReplicas: 2
  conditions:
    - type: Progressing
      status: 'True'
      lastUpdateTime: '2023-07-07T17:49:05Z'
      lastTransitionTime: '2023-07-07T16:17:16Z'
      reason: NewReplicaSetAvailable
      message: >-
        ReplicaSet "modelmesh-serving-ovms-1.x-6cdbbbbc79" has successfully
        progressed.
    - type: Available
      status: 'True'
      lastUpdateTime: '2023-07-10T15:41:34Z'
      lastTransitionTime: '2023-07-10T15:41:34Z'
      reason: MinimumReplicasAvailable
      message: Deployment has minimum availability.

heyselbi · 2023-07-11T19:00:56Z

Opened the issue in upstream kserve/modelmesh as well:
kserve/modelmesh-serving#401

While waiting for response from upstream, my next steps are:

try REST inferencing and see how affected it is
remove the mm secret from config map and see if rest proxy error persists
try gRPC inferencing in several namespaces

heyselbi added this to ODH Model Serving Planning Jul 10, 2023

heyselbi converted this from a draft issue Jul 10, 2023

heyselbi self-assigned this Jul 10, 2023

heyselbi moved this from Backlog to In Progress in ODH Model Serving Planning Jul 10, 2023

heyselbi added the kind/enhancement New feature or request label Jul 10, 2023

heyselbi linked a pull request Jul 11, 2023 that will close this issue

Add the grpc route creation option #35

Closed

3 tasks

heyselbi moved this from In Progress to To-do/Groomed in ODH Model Serving Planning Sep 18, 2023

heyselbi added this to ODH Feature Tracking and Internal tracking Oct 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable gRPC inferencing by exposing route #57

Enable gRPC inferencing by exposing route #57

heyselbi commented Jul 10, 2023 •

edited

Loading

heyselbi commented Jul 11, 2023

heyselbi commented Jul 11, 2023 •

edited

Loading

heyselbi commented Jul 11, 2023 •

edited

Loading

Enable gRPC inferencing by exposing route #57

Enable gRPC inferencing by exposing route #57

Comments

heyselbi commented Jul 10, 2023 • edited Loading

heyselbi commented Jul 11, 2023

heyselbi commented Jul 11, 2023 • edited Loading

heyselbi commented Jul 11, 2023 • edited Loading

heyselbi commented Jul 10, 2023 •

edited

Loading

heyselbi commented Jul 11, 2023 •

edited

Loading

heyselbi commented Jul 11, 2023 •

edited

Loading