Skip to content

How to quickly serve an LLM using Fast API, Celery, and Redis

Notifications You must be signed in to change notification settings

AI-Maker-Space/FastAPI-LLM-Model-Serving

Repository files navigation

Serving a Scalable Fast API application leveraging Celery and Redis

Pre-requisites:

  • You must deploy it on a GPU with ~16GB of memory
  • You will need docker compose (HINT: Do not use docker-compose)
  • You will need to ensure nvidia-ctk --version provides a valid output

Tasks:

  1. Provide a simple system diagram (created in whatever format you feel best communicates the flow) for the application
  2. Provide an example output from the model.
    • NOTE: Getting an output is a multi-step process.

This is meant to be a challenging task - so you might need to spend some time troubleshooting and tinkering!

About

How to quickly serve an LLM using Fast API, Celery, and Redis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published