Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable "trt_build_heuristics_enable" optimization for onnxruntime-TensorRT #241

Open
tobaiMS opened this issue Feb 23, 2024 · 2 comments
Open

Comments

@tobaiMS
Copy link

tobaiMS commented Feb 23, 2024

OnnxRuntime have support for trt_build_heuristics_enable with TensorRT optimization
We observed that some of the inference request take extremely long time, when the user traffic change, without using the TensorRT optimization, we set the default onnxruntime with { key: "cudnn_conv_algo_search" value: { string_value: "1" } } to enable heuristic search, however when move to use TensorRT, this setting will be ignored, ort provides an alternative setting "trt_build_heuristics_enable" here https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#configurations for TRT that we would like to try with Triton. , which is not supported in the Triton model config.

@gedoensmax
Copy link
Contributor

I would recommend rather enabling the timing cache. That will accelerate engine builds drastically. An engine cache will further help with not rebuilding the engine each time the same model is requested.

when the user traffic change
What exactly do you mean by that ? Dynamic shapes or different models ?

@tobaiMS
Copy link
Author

tobaiMS commented Feb 26, 2024

Hi, @gedoensmax thanks for the reply. currently I already enabled
parameters { key: "trt_engine_cache_enable" value: "True" }
parameters { key: "trt_engine_cache_path" value: "/tmp" }
for the "trt_timing_cache_path" seems it's also not supported in the triton ORT TRT configuration, https://github.com/triton-inference-server/onnxruntime_backend?tab=readme-ov-file#onnx-runtime-with-tensorrt-optimization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants