Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in data types: using model with lora #2434

Open
2 of 4 tasks
Alireza3242 opened this issue Nov 11, 2024 · 2 comments
Open
2 of 4 tasks

Error in data types: using model with lora #2434

Alireza3242 opened this issue Nov 11, 2024 · 2 comments
Assignees
Labels
bug Something isn't working triaged Issue has been triaged by maintainers

Comments

@Alireza3242
Copy link

System Info

a100

Who can help?

@byshiue
@juney-nvidia

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I use tensorrt 0.14.
I converted my model with bfloat16 and my lora to float16. In a dummy request to triton i get this error:

[TensorRT-LLM][ERROR] Encountered an error when fetching new request: [TensorRT-LLM][ERROR] Assertion failed: Expected lora weights to be the same data type as base model (/workspace/tensorrt_llm/cpp/tensorrt_llm/runtime/loraUtils.cpp:66)
1       0x7fd90181fa84 tensorrt_llm::common::throwRuntimeError(char const*, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 100
2       0x7fd90182c351 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x727351) [0x7fd90182c351]
3       0x7fd903b8b4b8 tensorrt_llm::batch_manager::PeftCacheManager::addRequestPeft(std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest>, bool) + 184
4       0x7fd903bad2c2 tensorrt_llm::batch_manager::TrtGptModelInflightBatching::updatePeftCache(std::shared_ptr<tensorrt_llm::batch_manager::LlmRequest> const&) + 82
5       0x7fd903bf8f86 tensorrt_llm::executor::Executor::Impl::fetchNewRequests[abi:cxx11](int, std::optional<float>, double&) + 2374
6       0x7fd903bfada8 tensorrt_llm::executor::Executor::Impl::executionLoop() + 1144
7       0x7fd9d89cc253 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc253) [0x7fd9d89cc253]
8       0x7fd9d875bac3 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x94ac3) [0x7fd9d875bac3]
9       0x7fd9d87eca04 clone + 68

If i convert my model and lora both to bfloat16, I get an error for dummy request. I run this file to save lora in cache:
https://github.com/triton-inference-server/tensorrtllm_backend/blob/v0.14.0/inflight_batcher_llm/client/inflight_batcher_llm_client.py

python3 inflight_batcher_llm_client.py --top-k 0 --top-p 0.5 --request-output-len 10 --text hello --tokenizer-dir /app/data/lora/torch/1 --lora-path /app/data/lora/numpy/1 --lora-task-id 1 --streaming 
=========
Using pad_id:  128001
Using end_id:  128001
Input sequence:  [128000, 15339]
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/app/src/lora/inflight_batcher_llm_client.py", line 730, in <module>
    inputs = prepare_inputs(
  File "/app/src/lora/inflight_batcher_llm_client.py", line 151, in prepare_inputs
    prepare_tensor("lora_weights", lora_weights_data),
  File "/app/src/lora/inflight_batcher_llm_client.py", line 104, in prepare_tensor
    t = grpcclient.InferInput(name, input.shape,
  File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_infer_input.py", line 56, in __init__
    self._input.datatype = datatype
TypeError: bad argument type for built-in operation

If i convert my model and lora both to float16, it will work

Expected behavior

work for bfloat16

actual behavior

not work for bfloat16

additional notes


@Alireza3242 Alireza3242 added the bug Something isn't working label Nov 11, 2024
@Alireza3242 Alireza3242 changed the title Error in data types: use model with lora Error in data types: using model with lora Nov 11, 2024
@VincentJing
Copy link

Hi, @Alireza3242 , Could you please share the steps for generating the TRT-LLM engine?

@hello-11 hello-11 added the triaged Issue has been triaged by maintainers label Nov 13, 2024
@Alireza3242
Copy link
Author

Alireza3242 commented Dec 25, 2024

@VincentJing
I still have this problem in tensorrt-llm 0.15.

I used these configs for convert, build, lora_convert, and dumy request:

"convert": {
"model_dir": "/app/data/mistral_fa/model",
"output_dir": "/app/data/tllm_checkpoint",
"dtype": "bfloat16",
},
"build": {
"checkpoint_dir": "/app/data/tllm_checkpoint",
"output_dir": "/app/model_repository/tensorrt_llm/1",
"gemm_plugin": "auto",
"max_batch_size": "32",
"max_input_len": "2048",
"max_num_tokens": "4096",
"lora_plugin": "bfloat16",
"lora_dir": "/app/data/mistral_fa/lora/torch/1",
"max_lora_rank": "16",
"lora_target_modules": "attn_qkv attn_q attn_k attn_v attn_dense mlp_h_to_4h mlp_4h_to_h mlp_gate"
},
"convert_lora": [
{
"in-file": "/app/data/mistral_fa/lora/torch/1",
"storage-type": "bfloat16",
"out-dir": "/app/data/mistral_fa/lora/numpy/1",
}
],
"dummy_requests": [
{
"top-k": "0",
"top-p":"0.5",
"request-output-len": "10",
"text": "hello",
"tokenizer-dir": "/app/data/mistral_fa/lora/torch/1",
"lora-path": "/app/data/mistral_fa/lora/numpy/1",
"lora-task-id": "1",
"streaming": ""
}

i run dummy_requests with inflight_batcher_llm_client.py
and convert_lora with hf_lora_convert.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants