Open
Description
System Info
a100
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
I use tensorrt 0.14.
I converted my model with bfloat16 and my lora to float16. In a dummy request to triton i get this error:
[TensorRT-LLM][ERROR] Encountered an error when fetching new request: [TensorRT-LLM][ERROR] Assertion failed: Expected lora weights to be the same data type as base model (/workspace/tensorrt_llm/cpp/tensorrt_llm/runtime/loraUtils.cpp:66)
1 0x7fd90181fa84 tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2 0x7fd90182c351 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x727351) [0x7fd90182c351]
3 0x7fd903b8b4b8 tensorrt_llm::batch_manager::PeftCacheManager::addRequestPeft(std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, bool) + 184
4 0x7fd903bad2c2 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::updatePeftCache(std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> const&) + 82
5 0x7fd903bf8f86 tensorrt_llm::executor::Executor::Impl::fetchNewRequests[abi:cxx11](int, std::optional<float>, double&) + 2374
6 0x7fd903bfada8 tensorrt_llm::executor::Executor::Impl::executionLoop() + 1144
7 0x7fd9d89cc253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7fd9d89cc253]
8 0x7fd9d875bac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fd9d875bac3]
9 0x7fd9d87eca04 clone + 68
If i convert my model and lora both to bfloat16, I get an error for dummy request. I run this file to save lora in cache:
https://github.com/triton-inference-server/tensorrtllm_backend/blob/v0.14.0/inflight_batcher_llm/client/inflight_batcher_llm_client.py
python3 inflight_batcher_llm_client.py --top-k 0 --top-p 0.5 --request-output-len 10 --text hello --tokenizer-dir /app/data/lora/torch/1 --lora-path /app/data/lora/numpy/1 --lora-task-id 1 --streaming
=========
Using pad_id: 128001
Using end_id: 128001
Input sequence: [128000, 15339]
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
cli.main()
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
runpy.run_path(target, run_name="__main__")
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "/app/src/lora/inflight_batcher_llm_client.py", line 730, in <module>
inputs = prepare_inputs(
File "/app/src/lora/inflight_batcher_llm_client.py", line 151, in prepare_inputs
prepare_tensor("lora_weights", lora_weights_data),
File "/app/src/lora/inflight_batcher_llm_client.py", line 104, in prepare_tensor
t = grpcclient.InferInput(name, input.shape,
File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_infer_input.py", line 56, in __init__
self._input.datatype = datatype
TypeError: bad argument type for built-in operation
If i convert my model and lora both to float16, it will work
Expected behavior
work for bfloat16
actual behavior
not work for bfloat16