Skip to content

Latest commit

 

History

History

TensorRT-LLM

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

TensorRT-LLM

TensorRT-LLM is an advanced open-source library developed by NVIDIA to optimize the inference performance of large language models (LLMs) on NVIDIA GPUs. TensorRT-LLM incorporates numerous optimizations specific to LLMs, such as custom attention kernels, in-flight batching, paged key-value caching, and various quantization techniques (e.g., FP8, INT4 AWQ, INT8 SmoothQuant) to enhance inference efficiency

Platform Specific Instuctions and scripts used for LLM-Inference-Bench