Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different output with same input between dynamic and static shape model #4250

Open
smarttowel opened this issue Nov 15, 2024 · 4 comments
Open
Assignees
Labels
Accuracy Output mismatch between TensorRT and other frameworks internal-bug-tracked Tracked internally, will be fixed in a future release. triaged Issue has been triaged by maintainers

Comments

@smarttowel
Copy link

Description

I have onnx model with fixed input from files. I try to run my model with trtexec by command:

trtexec with dynamic shape

/usr/src/tensorrt/bin/trtexec --onnx=~/model.onnx --loadInputs=descriptors_0:/home/pavel/test/descr0.dat,descriptors_1:/home/pavel/test/descr1.dat --minShapes=descriptors_0:1x256x1,descriptors_1:1x256x1 --optShapes=descriptors_0:1x256x1000,descriptors_1:1x256x1000 --maxShapes=descriptors_0:1x256x1024,descriptors_1:1x256x1024 --shapes=descriptors_0:1x256x100,descriptors_1:1x256x5 --exportOutput=/home/pavel/dynamic_debug.json --saveEngine=/home/pavel/trt.engine

I receive unexpected output from dynamic_debug.json (many numbers is zero) and try to check same onnx model with static shape by command:

trtexec with static shape

/usr/src/tensorrt/bin/trtexec --onnx=~/model.onnx --loadInputs=descriptors_0:/home/pavel/test/descr0.dat,descriptors_1:/home/pavel/test/descr1.dat --shapes=descriptors_0:1x256x100,descriptors_1:1x256x5 --exportOutput=/home/pavel/static_debug.json --saveEngine=/home/pavel/trt.engine

In this case results looks good.

How can I receive same output with dynamic shape model?

Environment

TensorRT Version: 10.6.0.26

NVIDIA GPU: NVIDIA GeForce GTX 1650 Ti

NVIDIA Driver Version: 550.127.05

CUDA Version: 12.4

CUDNN Version: 9.5.1

Operating System: Kubuntu 24.04

Relevant Files

All files is here

@lix19937
Copy link

@smarttowel Can you try to other version of trt, like trt8.6/8.5, I think trt10.6 maybe has bug in cuda-x .

BTW, I has try a test on trt8.5.10, dynamic.out is similar to static.out, the abs_diff_max=0.00048.

You can run my script, and upload the dynamic_lix.json dynamic_lix.log, static_lix.json, static_lix.log.

#!/bin/bash

trtexec --onnx=./model.onnx \
--loadInputs=descriptors_0:./descr0.dat,descriptors_1:./descr1.dat \
--minShapes=descriptors_0:1x256x1,descriptors_1:1x256x1 \
--optShapes=descriptors_0:1x256x1000,descriptors_1:1x256x1000 \
--maxShapes=descriptors_0:1x256x1024,descriptors_1:1x256x1024 \
--shapes=descriptors_0:1x256x100,descriptors_1:1x256x5 \
--exportOutput=./dynamic.json \
--dumpProfile --dumpLayerInfo --separateProfileRun  \
--profilingVerbosity=detailed \
--exportLayerInfo=dynamic_lix.json \
--verbose 2>&1 |  tee dynamic_lix.log


trtexec --onnx=./model.onnx \
--loadInputs=descriptors_0:./descr0.dat,descriptors_1:./descr1.dat \
--shapes=descriptors_0:1x256x100,descriptors_1:1x256x5 \
--exportOutput=./static.json \
--dumpProfile --dumpLayerInfo --separateProfileRun  \
--profilingVerbosity=detailed \
--exportLayerInfo=static_lix.json \
--verbose 2>&1 | tee  static_lix.log

@poweiw poweiw added triaged Issue has been triaged by maintainers Accuracy Output mismatch between TensorRT and other frameworks labels Nov 18, 2024
@smarttowel
Copy link
Author

@lix19937 Thank you for reply!

Repo was updated with new log files.
In addition, I can confirm that there no issues with TensorRT 8.6 on same hardware and TensorRT 8.5 on Orin NX R35.3.1.

@kevinch-nv kevinch-nv self-assigned this Nov 22, 2024
@kevinch-nv kevinch-nv added the internal-bug-tracked Tracked internally, will be fixed in a future release. label Nov 22, 2024
@kevinch-nv
Copy link
Collaborator

Was able to repro locally with my Ampere card as well. Looks to be an internal library issue, we've filed a bug internally to track the fix.

@lix19937
Copy link

@lix19937 Thank you for reply!

Repo was updated with new log files. In addition, I can confirm that there no issues with TensorRT 8.6 on same hardware and TensorRT 8.5 on Orin NX R35.3.1.

From your logs, no helpful info. It should be nvinfer or cuda-x library's bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accuracy Output mismatch between TensorRT and other frameworks internal-bug-tracked Tracked internally, will be fixed in a future release. triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants