Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facing Issues with Load Balancing using NGINX Load Balancer on AWS EKS #54

Open
MeghaVarshney21 opened this issue Apr 13, 2022 · 2 comments

Comments

@MeghaVarshney21
Copy link

MeghaVarshney21 commented Apr 13, 2022

I am deploying a triton inference server on the Amazon Elastic Kubernetes Service (Amazon EKS) and using Nginx Open-Source Load Balancer for load-balancing. Our EKS Cluster is private (EKS Nodes are in private subnets) so that no one can access it from the outside world.

Since, triton inference server has three endpoints:-
port 8000: for HTTP requests
port 8001: for grpc requests
port 8002: Prometheus metrics server

First of all, I have created a deployment for Triton on AWS EKS and exposed it using clusterIP = None, so that all the replicas endpoints are exposed and identified by NGINX Load Balancer.

apiVersion: v1
kind: Service
metadata:
  name: triton
  labels:
    app: triton
spec:
  clusterIP: None
  ports:
     - protocol: TCP
       port: 8000
       name: http
       targetPort: 8000
     - protocol: TCP
       port: 8001
       name: grpc
       targetPort: 8001
     - protocol: TCP
       port: 8002
       name: metrics
       targetPort: 8002
  selector:
    app: triton

Then, I have created a image for nginx opensource load balancer using the below configuration.
Configuration file for NGINX on EKS node at the location /etc/nginx/conf.d/nginx.conf.

resolver kube-dns.kube-system.svc.cluster.local valid=5s;
upstream backend {
   zone upstream-backend 64k;
   server triton.default.svc.cluster.local:8000;
}
 
upstream backendgrpc {
   zone upstream-backend 64k;
   server triton.default.svc.cluster.local:8001;
}
 
server {
   listen 80;
   location / {
     proxy_pass http://backend/;
   }
}
 
server {
        listen 89 http2;
 
        location / {
            grpc_pass grpc://backendgrpc;
        }
}
 
server {
    listen 8080;
    root /usr/share/nginx/html;
    location = /dashboard.html { }
    location = / {
       return 302 /dashboard.html;
    }
} 

Dockerfile for Nginx Opensource LB is:-

FROM nginx
RUN rm /etc/nginx/conf.d/default.conf
COPY /etc/nginx/conf.d/nginx.conf /etc/nginx/conf.d/default.conf

I have created a ReplicationController for NGINX. To pull the image from the private registry, Kubernetes needs credentials.
The imagePullSecrets field in the configuration file specifies that Kubernetes should get the credentials from a Secret named ecr-cred.

The nginx-rc file looks like:-

 apiVersion: v1
 kind: ReplicationController
 metadata:
   name: nginx-rc
 spec:
   replicas: 1
   selector:
     app: nginx
   template:
     metadata:
       labels:
         app: nginx
     spec:
       imagePullSecrets:
       - name: ecr-cred
       containers:
       - name: nginx
         command: [ "/bin/bash", "-c", "--" ]
         args: [ "nginx; while true; do sleep 30; done;" ]
         imagePullPolicy: IfNotPresent
         image: <Image URL with tag>
         ports:
           - name: http
             containerPort: 80
             hostPort: 8085
           - name: grpc
             containerPort: 89
             hostPort: 8087
           - name: http-alt
             containerPort: 8080
             hostPort: 8086
           - name: triton-svc
             containerPort: 8000
             hostPort: 32309

Now, the issue which I am facing is, when the pods are increasing, the nginx load balancer is not doing the load balancing between those newly added pods.

Can anyone help me?

@MeghaVarshney21 MeghaVarshney21 changed the title Why Nginx Opensource Load Balancer is not balancing the load between Triton Pods in EKS cluster? Facing Issues with Load Balancing using NGINX Load Balancer on AWS EKS Apr 13, 2022
@pleshakov
Copy link
Contributor

Hi @MeghaVarshney21

resolver kube-dns.kube-system.svc.cluster.local valid=5s;
upstream backend {
   zone upstream-backend 64k;
   server triton.default.svc.cluster.local:8000;
}

NGINX OSS only resolves DNS names when it starts or when it is reloaded. That's why you see "Now, the issue which I am facing is, when the pods are increasing, the nginx load balancer is not doing the load balancing between those newly added pods."

However, re-resolving DNS names is available in NGINX Plus - the commercial version of NGINX. For re-resolving, the configuration looks like this -- https://github.com/nginxinc/NGINX-Demos/blob/master/kubernetes-demo/third/nginxplus/backend.conf#L3-L6

More about re-resolving DNS names -- https://www.nginx.com/blog/dns-service-discovery-nginx-plus/

Also note that for Kubernetes, we also have the Ingress Controller, which works both for NGINX and NGINX OSS and it will automatically update NGINX configuration when new backend pods are added. See https://github.com/nginxinc/kubernetes-ingress

GRPC examples:

Hope this helps

@MeghaVarshney21
Copy link
Author

MeghaVarshney21 commented Apr 20, 2022

Thanks @pleshakov

Now, I am using nginx plus loadbalancer but again I am facing an issue.

When hpa scale down the pods, nginx loadbalancer is showing "server not ready" error.

Could you please help me with this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants