Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unet results wrong of TensorRT 10.x when running on GPU L40s #4351

Open
Fans0014 opened this issue Feb 8, 2025 · 1 comment
Open

Unet results wrong of TensorRT 10.x when running on GPU L40s #4351

Fans0014 opened this issue Feb 8, 2025 · 1 comment
Assignees
Labels
Investigating Issue needs further investigation Module:Performance General performance issues triaged Issue has been triaged by maintainers

Comments

@Fans0014
Copy link

Fans0014 commented Feb 8, 2025

Description

Im trying to convert a unet(model size 1.9GB/fp32) with opset=17 to tensorrt.
trt8.6 version results was correct, but got 15% performance down.
trt10.0.0/10.5/10.8 results were nan.
Is there some high level optimization(eg op fusion) was introduced to trt10.x which may cause the nan results?
How can I debug the inference procedure to fix my pytorch model, and can run it correctlly on trt10.x.

Environment

TensorRT Version:8.6, 10.0, 10.5, 10.8

NVIDIA GPU:L40s

NVIDIA Driver Version:535.161.08

CUDA Version:12.2

CUDNN Version:8.4

Operating System:Ubuntu 20.04

Python Version (if applicable):3.10

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

@LeoZDong LeoZDong added Module:Performance General performance issues triaged Issue has been triaged by maintainers Investigating Issue needs further investigation labels Feb 10, 2025
@Fans0014
Copy link
Author

Fans0014 commented Feb 19, 2025

I use polygraphy to debug the tensorrt model layer precision. I got another problem.
1, With command
polygraphy run unet_fp32_op17_v5.onnx --load-inputs ./data/rel_input.json --trt --onnxrt --tactic-sources --rtol 1e-02 --atol 1e-02 > log_trt_out.txt
The result shows below
trt-runner Stats: mean=-0.14245, std-dev=0.44876, var=0.20138, median=-0.20319, min=-1.5187 at (0, 0, 127, 16), max=1.2616 at (0, 1, 104, 0)
onnxrt-runner Stats: mean=-0.14654, std-dev=0.45621, var=0.20813, median=-0.20612, min=-1.5602 at (0, 0, 127, 16), max=1.1212 at (0, 1, 104, 0)

2, With command
polygraphy run unet_fp32_op17_v5.onnx --load-inputs ./data/rel_input.json --trt --onnxrt --tactic-sources --rtol 1e-02 --atol 1e-02 --trt-outputs mark all --onnx-outputs mark all > log_trt_all.txt
The result shows below
trt-runner Stats: mean=-0.14654, std-dev=0.45621, var=0.20813, median=-0.20612, min=-1.5602 at (0, 0, 127, 16), max=1.1212 at (0, 1, 104, 0)
onnxrt-runne Stats: mean=-0.14654, std-dev=0.45621, var=0.20813, median=-0.20612, min=-1.5602 at (0, 0, 127, 16), max=1.1212 at (0, 1, 104, 0)


Why the result difference under the same tools and model?
Onnx: 1.17
Onnx runtime: 1.20
trt: 10.7
polygraphy: 0.49.14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Investigating Issue needs further investigation Module:Performance General performance issues triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants