@@ -11,27 +11,34 @@ TensorRT-LLM optimizes the performance of a range of well-known models on NVIDIA
1111| ` BertForSequenceClassification ` | BERT-based | ` textattack/bert-base-uncased-yelp-polarity ` | L |
1212| ` DeciLMForCausalLM ` | Nemotron | ` nvidia/Llama-3_1-Nemotron-51B-Instruct ` | L |
1313| ` DeepseekV3ForCausalLM ` | DeepSeek-V3 | ` deepseek-ai/DeepSeek-V3 ` | L |
14- | ` LlavaLlamaModel ` | VILA | ` Efficient-Large-Model/NVILA-8B ` | L + V |
15- | ` LlavaNextForConditionalGeneration ` | LLaVA-NeXT | ` llava-hf/llava-v1.6-mistral-7b-hf ` | L + V |
14+ | ` Exaone4ForCausalLM ` | EXAONE 4.0 | ` LGAI-EXAONE/EXAONE-4.0-32B ` | L |
15+ | ` Gemma3ForCausalLM ` | Gemma 3 | ` google/gemma-3-1b-it ` | L |
16+ | ` Gemma3ForConditionalGeneration ` | Gemma 3 | ` google/gemma-3-27b-it ` | L + I |
17+ | ` HCXVisionForCausalLM ` | HyperCLOVAX-SEED-Vision | ` naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B ` | L + I |
18+ | ` LlavaLlamaModel ` | VILA | ` Efficient-Large-Model/NVILA-8B ` | L + I + V |
19+ | ` LlavaNextForConditionalGeneration ` | LLaVA-NeXT | ` llava-hf/llava-v1.6-mistral-7b-hf ` | L + I |
1620| ` LlamaForCausalLM ` | Llama 3.1, Llama 3, Llama 2, LLaMA | ` meta-llama/Meta-Llama-3.1-70B ` | L |
17- | ` Llama4ForConditionalGeneration ` | Llama 4 | ` meta-llama/Llama-4-Scout-17B-16E-Instruct ` | L |
21+ | ` Llama4ForConditionalGeneration ` | Llama 4 | ` meta-llama/Llama-4-Scout-17B-16E-Instruct ` | L + I |
1822| ` MistralForCausalLM ` | Mistral | ` mistralai/Mistral-7B-v0.1 ` | L |
23+ | ` Mistral3ForConditionalGeneration ` | Mistral3 | ` mistralai/Mistral-Small-3.1-24B-Instruct-2503 ` | L + I |
1924| ` MixtralForCausalLM ` | Mixtral | ` mistralai/Mixtral-8x7B-v0.1 ` | L |
2025| ` MllamaForConditionalGeneration ` | Llama 3.2 | ` meta-llama/Llama-3.2-11B-Vision ` | L |
2126| ` NemotronForCausalLM ` | Nemotron-3, Nemotron-4, Minitron | ` nvidia/Minitron-8B-Base ` | L |
2227| ` NemotronNASForCausalLM ` | NemotronNAS | ` nvidia/Llama-3_3-Nemotron-Super-49B-v1 ` | L |
28+ | ` Phi4MMForCausalLM ` | Phi-4-multimodal | ` microsoft/Phi-4-multimodal-instruct ` | L + I + A |
2329| ` Qwen2ForCausalLM ` | QwQ, Qwen2 | ` Qwen/Qwen2-7B-Instruct ` | L |
2430| ` Qwen2ForProcessRewardModel ` | Qwen2-based | ` Qwen/Qwen2.5-Math-PRM-7B ` | L |
2531| ` Qwen2ForRewardModel ` | Qwen2-based | ` Qwen/Qwen2.5-Math-RM-72B ` | L |
26- | ` Qwen2VLForConditionalGeneration ` | Qwen2-VL | ` Qwen/Qwen2-VL-7B-Instruct ` | L + V |
27- | ` Qwen2_5_VLForConditionalGeneration ` | Qwen2.5-VL | ` Qwen/Qwen2.5-VL-7B-Instruct ` | L + V |
32+ | ` Qwen2VLForConditionalGeneration ` | Qwen2-VL | ` Qwen/Qwen2-VL-7B-Instruct ` | L + I + V |
33+ | ` Qwen2_5_VLForConditionalGeneration ` | Qwen2.5-VL | ` Qwen/Qwen2.5-VL-7B-Instruct ` | L + I + V |
2834| ` Qwen3ForCausalLM ` | Qwen3 | ` Qwen/Qwen3-8B ` | L |
2935| ` Qwen3MoeForCausalLM ` | Qwen3MoE | ` Qwen/Qwen3-30B-A3B ` | L |
3036
3137Note:
32- - L: Language only
33- - L + V: Language and Vision multimodal support
34- - Llama 3.2 accepts vision input, but our support currently limited to text only.
38+ - L: Language
39+ - I: Image
40+ - V: Video
41+ - A: Audio
3542
3643## Models (TensorRT Backend)
3744
0 commit comments