Skip to content

Error in data types: using model with lora #2434

Open
@Alireza3242

Description

@Alireza3242

System Info

a100

Who can help?

@byshiue
@juney-nvidia

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I use tensorrt 0.14.
I converted my model with bfloat16 and my lora to float16. In a dummy request to triton i get this error:

[TensorRT-LLM][ERROR] Encountered an error when fetching new request: [TensorRT-LLM][ERROR] Assertion failed: Expected lora weights to be the same data type as base model (/workspace/tensorrt_llm/cpp/tensorrt_llm/runtime/loraUtils.cpp:66)
1       0x7fd90181fa84 tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2       0x7fd90182c351 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x727351) [0x7fd90182c351]
3       0x7fd903b8b4b8 tensorrt_llm::batch_manager::PeftCacheManager::addRequestPeft(std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, bool) + 184
4       0x7fd903bad2c2 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::updatePeftCache(std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> const&) + 82
5       0x7fd903bf8f86 tensorrt_llm::executor::Executor::Impl::fetchNewRequests[abi:cxx11](int, std::optional<float>, double&) + 2374
6       0x7fd903bfada8 tensorrt_llm::executor::Executor::Impl::executionLoop() + 1144
7       0x7fd9d89cc253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7fd9d89cc253]
8       0x7fd9d875bac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fd9d875bac3]
9       0x7fd9d87eca04 clone + 68

If i convert my model and lora both to bfloat16, I get an error for dummy request. I run this file to save lora in cache:
https://github.com/triton-inference-server/tensorrtllm_backend/blob/v0.14.0/inflight_batcher_llm/client/inflight_batcher_llm_client.py

python3 inflight_batcher_llm_client.py --top-k 0 --top-p 0.5 --request-output-len 10 --text hello --tokenizer-dir /app/data/lora/torch/1 --lora-path /app/data/lora/numpy/1 --lora-task-id 1 --streaming 
=========
Using pad_id:  128001
Using end_id:  128001
Input sequence:  [128000, 15339]
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/app/src/lora/inflight_batcher_llm_client.py", line 730, in <module>
    inputs = prepare_inputs(
  File "/app/src/lora/inflight_batcher_llm_client.py", line 151, in prepare_inputs
    prepare_tensor("lora_weights", lora_weights_data),
  File "/app/src/lora/inflight_batcher_llm_client.py", line 104, in prepare_tensor
    t = grpcclient.InferInput(name, input.shape,
  File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_infer_input.py", line 56, in __init__
    self._input.datatype = datatype
TypeError: bad argument type for built-in operation

If i convert my model and lora both to float16, it will work

Expected behavior

work for bfloat16

actual behavior

not work for bfloat16

additional notes


Metadata

Metadata

Assignees

Labels

bugSomething isn't workingtriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions