Skip to content

Proof of concept for implementing load balancing for a Gradio app

Notifications You must be signed in to change notification settings

cbrownstein-lambda/gradio-load-balancing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 

Repository files navigation

gradio-load-balancing

Proof of concept for implementing load balancing for a Gradio app

Description

This proof of concept demonstrates how a Gradio app can use multiple GPUs in a Lambda Cloud on-demand instance.

By following the instructions below, you'll have an 8x H100 or 8x A100 running 8 Gradio apps. Each app will:

  • Use a single GPU.

  • Listen on a localhost port, TCP/{7860..7867}.

    💡 Note: You can set the GRADIO_SERVER_NAME environment variable to 0.0.0.0 to make the apps network-accessible. This can be useful for load balancing between multiple instances. Make sure to configure appropriate firewall rules!

  • Serve the FLUX.1 [schnell] prompt-to-image model.

  • Log to a file named log_{0..7}.txt, the number representing the GPU the app is using.

Nginx, acting as a load balancer (reverse proxy), will balance requests to the server between the 8 apps.

Usage

  1. Launch an 8x H100 or 8x A100 instance from the Cloud dashboard or using the Cloud API.

  2. Clone this repo to your home directory and change into the directory:

    git clone https://github.com/cbrownstein-lambda/gradio-load-balancing.git "$HOME/gradio-load-balancing" && \
    cd "$HOME/gradio-load-balancing"
  3. Run the launch_gradio.sh script:

    bash launch_gradio.sh
  4. Start Nginx:

    sudo docker run -d --rm --name nginx-load-balancer --network=host \
         -v "$PWD/nginx.conf":/etc/nginx/nginx.conf:ro nginx
  5. Configure the Firewall to allow TCP traffic to port 80.

    ⚠️ WARNING: This proof of concept requires no authentication. Anyone with your instance's IP address will be able to access the server (but not log into your instance). ⚠️

  6. Access the server at http://INSTANCE-IP-ADDRESS.

    Replace INSTANCE-IP-ADDRESS with your instance's IP address, which you can get from the Cloud dashboard or using the Cloud API.

About

Proof of concept for implementing load balancing for a Gradio app

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages