Proof of concept for implementing load balancing for a Gradio app
This proof of concept demonstrates how a Gradio app can use multiple GPUs in a Lambda Cloud on-demand instance.
By following the instructions below, you'll have an 8x H100 or 8x A100 running 8 Gradio apps. Each app will:
-
Use a single GPU.
-
Listen on a localhost port, TCP/{7860..7867}.
💡 Note: You can set the
GRADIO_SERVER_NAME
environment variable to0.0.0.0
to make the apps network-accessible. This can be useful for load balancing between multiple instances. Make sure to configure appropriate firewall rules! -
Serve the FLUX.1 [schnell] prompt-to-image model.
-
Log to a file named
log_{0..7}.txt
, the number representing the GPU the app is using.
Nginx, acting as a load balancer (reverse proxy), will balance requests to the server between the 8 apps.
-
Launch an 8x H100 or 8x A100 instance from the Cloud dashboard or using the Cloud API.
-
Clone this repo to your home directory and change into the directory:
git clone https://github.com/cbrownstein-lambda/gradio-load-balancing.git "$HOME/gradio-load-balancing" && \ cd "$HOME/gradio-load-balancing"
-
Run the
launch_gradio.sh
script:bash launch_gradio.sh
-
Start Nginx:
sudo docker run -d --rm --name nginx-load-balancer --network=host \ -v "$PWD/nginx.conf":/etc/nginx/nginx.conf:ro nginx
-
Configure the Firewall to allow TCP traffic to port 80.
⚠️ WARNING: This proof of concept requires no authentication. Anyone with your instance's IP address will be able to access the server (but not log into your instance).⚠️ -
Access the server at http://INSTANCE-IP-ADDRESS.
Replace INSTANCE-IP-ADDRESS with your instance's IP address, which you can get from the Cloud dashboard or using the Cloud API.