Unet results wrong of TensorRT 10.x when running on GPU L40s #4351
Labels
Investigating
Issue needs further investigation
Module:Performance
General performance issues
triaged
Issue has been triaged by maintainers
Description
Im trying to convert a unet(model size 1.9GB/fp32) with opset=17 to tensorrt.
trt8.6 version results was correct, but got 15% performance down.
trt10.0.0/10.5/10.8 results were nan.
Is there some high level optimization(eg op fusion) was introduced to trt10.x which may cause the nan results?
How can I debug the inference procedure to fix my pytorch model, and can run it correctlly on trt10.x.
Environment
TensorRT Version:8.6, 10.0, 10.5, 10.8
NVIDIA GPU:L40s
NVIDIA Driver Version:535.161.08
CUDA Version:12.2
CUDNN Version:8.4
Operating System:Ubuntu 20.04
Python Version (if applicable):3.10
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):The text was updated successfully, but these errors were encountered: