Update on the development branch #847
kaiyux
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
The TensorRT-LLM team is pleased to announce that we are pushing an update to the development branch (and the Triton backend) this January 9th, 2024.
This update includes:
ModelConfig()
as a clean configuration interface for LLM tasksLLM()
for LLM pipelines, it will trigger the necessary engine building or model quantization silently in the backgroundgenerate()
API for batched offline inference, both single-GPU and multi-GPU supportedgenerate_async()
API for asynchronous offline inference on a single GPU, streaming mode is supportedInferenceRequest
GptManager pybind 2/4TP run demo #701freeGpuMemoryFraction
parameter from 0.85 to 0.9 for higher throughputThanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions