lora doesn't work with --use_fp8_rowwise #2603

ShuaiShao93 · 2024-12-20T20:15:12Z

System Info

x86_84, ubuntu, L40s GPU

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

clone llama 3.1 8b
install trtllm 0.15.0
Run commands below

python3 TensorRT-LLM/examples/llama/convert_checkpoint.py --model_dir ./Meta-Llama-3.1-8B-Instruct --output_dir ./tllm_8b_checkpoint_1gpu_fp8  --dtype float16 --use_fp8_rowwise --fp8_kv_cache

trtllm-build --checkpoint_dir ./tllm_8b_checkpoint_1gpu_fp8 --output_dir ./tmp/llama/8B/trt_engines/fp8/1-gpu  --gpt_attention_plugin auto  --gemm_plugin auto --max_num_tokens 128000 --max_batch_size 64 --logits_dtype=float32 --gather_generation_logits --kv_cache_type=paged --lora_plugin auto  --lora_dir llama-3.1-8b-lora-weights

Expected behavior

The engine should be built

actual behavior

Got error

Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/trtllm-build", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 602, in main
    parallel_build(model_config, ckpt_dir, build_config, args.output_dir,
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 425, in parallel_build
    passed = build_and_save(rank, rank % workers, ckpt_dir,
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 390, in build_and_save
    engine = build_model(build_config,
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/commands/build.py", line 383, in build_model
    return build(model, build_config)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/builder.py", line 1207, in build
    model(**inputs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/module.py", line 52, in __call__
    output = self.forward(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 956, in forward
    hidden_states = self.transformer.forward(**kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 283, in forward
    hidden_states = self.layers.forward(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 540, in forward
    hidden_states = layer(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/module.py", line 52, in __call__
    output = self.forward(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 146, in forward
    attention_output = self.attention(
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/module.py", line 52, in __call__
    output = self.forward(*args, **kwargs)
  File "/home/ubuntu/.local/lib/python3.10/site-packages/tensorrt_llm/quantization/layers.py", line 1937, in forward
    assert lora_layer_params is None, f"lora is not supported on {self.__class__.__name__} now"
AssertionError: lora is not supported on Fp8RowwiseAttention now

additional notes

N/A

The text was updated successfully, but these errors were encountered:

ShuaiShao93 added the bug Something isn't working label Dec 20, 2024

ShuaiShao93 mentioned this issue Dec 23, 2024

[Feature Request] Better support for w4a8 quantization #2605

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lora doesn't work with --use_fp8_rowwise #2603

lora doesn't work with --use_fp8_rowwise #2603

ShuaiShao93 commented Dec 20, 2024

lora doesn't work with --use_fp8_rowwise #2603

lora doesn't work with --use_fp8_rowwise #2603

Comments

ShuaiShao93 commented Dec 20, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes