You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
clone llama 3.1 8b
install trtllm 0.15.0
Run commands below
python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir ./Meta-Llama-3.1-8B-Instruct --output_dir ./tllm_8b_checkpoint_1gpu_fp8 --dtype float16 --use_fp8_rowwise --fp8_kv_cache
trtllm-build --checkpoint_dir ./tllm_8b_checkpoint_1gpu_fp8 --output_dir ./tmp/llama/8B/trt_engines/fp8/1-gpu --gpt_attention_plugin auto --gemm_plugin auto --max_num_tokens 128000 --max_batch_size 64 --logits_dtype=float32 --gather_generation_logits --kv_cache_type=paged --lora_plugin auto --lora_dir llama-3.1-8b-lora-weights
Expected behavior
The engine should be built
actual behavior
Got error
Traceback (most recent call last):
File "/home/ubuntu/.local/bin/trtllm-build", line 8, in <module>
sys.exit(main())
File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 602, in main
parallel_build(model_config, ckpt_dir, build_config, args.output_dir,
File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 425, in parallel_build
passed = build_and_save(rank, rank % workers, ckpt_dir,
File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 390, in build_and_save
engine = build_model(build_config,
File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 383, in build_model
return build(model, build_config)
File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/builder.py", line 1207, in build
model(**inputs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/module.py", line 52, in __call__
output = self.forward(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 956, in forward
hidden_states = self.transformer.forward(**kwargs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 283, in forward
hidden_states = self.layers.forward(
File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 540, in forward
hidden_states = layer(
File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/module.py", line 52, in __call__
output = self.forward(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 146, in forward
attention_output = self.attention(
File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/module.py", line 52, in __call__
output = self.forward(*args, **kwargs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/quantization/layers.py", line 1937, in forward
assert lora_layer_params is None, f"lora is not supported on {self.__class__.__name__} now"
AssertionError: lora is not supported on Fp8RowwiseAttention now
additional notes
N/A
The text was updated successfully, but these errors were encountered:
System Info
x86_84, ubuntu, L40s GPU
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
The engine should be built
actual behavior
Got error
additional notes
N/A
The text was updated successfully, but these errors were encountered: