Skip to content

Latest commit

 

History

History
8 lines (5 loc) · 363 Bytes

README.md

File metadata and controls

8 lines (5 loc) · 363 Bytes

llama inference

Exploration of latency on various setups of inference with llama.

Caveats

  • I didn't explore throughput. That is a deep rabbit hole - I was just exploring latency for a single request. You can tradeoff throughput and latency with various forms of batching requests.
  • I tried my best to use tools based on the documentation provided.