You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
python3 inflight_batcher_llm_client.py --top-k 0 --top-p 0.5 --request-output-len 10 --text hello --tokenizer-dir /app/data/lora/torch/1 --lora-path /app/data/lora/numpy/1 --lora-task-id 1 --streaming
=========
Using pad_id: 128001
Using end_id: 128001
Input sequence: [128000, 15339]
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
cli.main()
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
runpy.run_path(target, run_name="__main__")
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/root/.vscode-server/extensions/ms-python.python-2023.14.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "/app/src/lora/inflight_batcher_llm_client.py", line 730, in <module>
inputs = prepare_inputs(
File "/app/src/lora/inflight_batcher_llm_client.py", line 151, in prepare_inputs
prepare_tensor("lora_weights", lora_weights_data),
File "/app/src/lora/inflight_batcher_llm_client.py", line 104, in prepare_tensor
t = grpcclient.InferInput(name, input.shape,
File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_infer_input.py", line 56, in __init__
self._input.datatype = datatype
TypeError: bad argument type for built-in operation
If i convert my model and lora both to float16, it will work
Expected behavior
work for bfloat16
actual behavior
not work for bfloat16
additional notes
The text was updated successfully, but these errors were encountered:
System Info
a100
Who can help?
@byshiue
@juney-nvidia
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I use tensorrt 0.14.
I converted my model with bfloat16 and my lora to float16. In a dummy request to triton i get this error:
If i convert my model and lora both to bfloat16, I get an error for dummy request. I run this file to save lora in cache:
https://github.com/triton-inference-server/tensorrtllm_backend/blob/v0.14.0/inflight_batcher_llm/client/inflight_batcher_llm_client.py
If i convert my model and lora both to float16, it will work
Expected behavior
work for bfloat16
actual behavior
not work for bfloat16
additional notes
The text was updated successfully, but these errors were encountered: