Error in data types: using model with lora

### System Info

a100

### Who can help?

@byshiue
@juney-nvidia

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [x] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

I use tensorrt 0.14.
I converted my model with **bfloat16** and my lora to **float16**. In a dummy request to triton i get this error:

```
[TensorRT-LLM][ERROR] Encountered an error when fetching new request: [TensorRT-LLM][ERROR] Assertion failed: Expected lora weights to be the same data type as base model (/workspace/tensorrt_llm/cpp/tensorrt_llm/runtime/loraUtils.cpp:66)
1       0x7fd90181fa84 tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2       0x7fd90182c351 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x727351) [0x7fd90182c351]
3       0x7fd903b8b4b8 tensorrt_llm::batch_manager::PeftCacheManager::addRequestPeft(std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, bool) + 184
4       0x7fd903bad2c2 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::updatePeftCache(std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> const&) + 82
5       0x7fd903bf8f86 tensorrt_llm::executor::Executor::Impl::fetchNewRequests[abi:cxx11](int, std::optional<float>, double&) + 2374
6       0x7fd903bfada8 tensorrt_llm::executor::Executor::Impl::executionLoop() + 1144
7       0x7fd9d89cc253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7fd9d89cc253]
8       0x7fd9d875bac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fd9d875bac3]
9       0x7fd9d87eca04 clone + 68
```

If i convert my model and lora both to **bfloat16**, I get an error for dummy request. I run this file to save lora in cache:
https://github.com/triton-inference-server/tensorrtllm_backend/blob/v0.14.0/inflight_batcher_llm/client/inflight_batcher_llm_client.py
```
python3 inflight_batcher_llm_client.py --top-k 0 --top-p 0.5 --request-output-len 10 --text hello --tokenizer-dir /app/data/lora/torch/1 --lora-path /app/data/lora/numpy/1 --lora-task-id 1 --streaming 
=========
Using pad_id:  128001
Using end_id:  128001
Input sequence:  [128000, 15339]
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/app/src/lora/inflight_batcher_llm_client.py", line 730, in <module>
    inputs = prepare_inputs(
  File "/app/src/lora/inflight_batcher_llm_client.py", line 151, in prepare_inputs
    prepare_tensor("lora_weights", lora_weights_data),
  File "/app/src/lora/inflight_batcher_llm_client.py", line 104, in prepare_tensor
    t = grpcclient.InferInput(name, input.shape,
  File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_infer_input.py", line 56, in __init__
    self._input.datatype = datatype
TypeError: bad argument type for built-in operation
```

If i convert my model and lora both to **float16**, it will work

### Expected behavior

work for bfloat16

### actual behavior

not work for bfloat16

### additional notes

---

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error in data types: using model with lora #2434

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Error in data types: using model with lora #2434

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions