Name		Name	Last commit message	Last commit date
parent directory ..
A100		A100
GH200		GH200
H100		H100
README.md		README.md

README.md

TensorRT-LLM

TensorRT-LLM is an advanced open-source library developed by NVIDIA to optimize the inference performance of large language models (LLMs) on NVIDIA GPUs. TensorRT-LLM incorporates numerous optimizations specific to LLMs, such as custom attention kernels, in-flight batching, paged key-value caching, and various quantization techniques (e.g., FP8, INT4 AWQ, INT8 SmoothQuant) to enhance inference efficiency

Platform Specific Instuctions and scripts used for LLM-Inference-Bench

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT-LLM

TensorRT-LLM

README.md

TensorRT-LLM

Files

TensorRT-LLM

Directory actions

More options

Directory actions

More options

Latest commit

History

TensorRT-LLM

Folders and files

parent directory

README.md

TensorRT-LLM