Skip to content

LLM Inference Engine Optimized For Head-of-Line Blocking

Notifications You must be signed in to change notification settings

galletas1712/prophet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prophet: An LLM Inference Engine Optimized For Head-of-Line Blocking

To start the benchmark, run: python run_ray.py

The default configuration is for running on an AWS p4d.24xlarge instance. To adjust the number of prefiller and decoder GPUs, change config/coordinators/ray_coordinator.yaml.

If memory runs out, the batch size can be change in the prefiller and decoder scheduler configurations.

About

LLM Inference Engine Optimized For Head-of-Line Blocking

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages