Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP16 model of TensorRT 10.0 are incorrect when running on GPU T4 #4022

Open
yflv-yanxia opened this issue Jul 23, 2024 · 7 comments
Open

FP16 model of TensorRT 10.0 are incorrect when running on GPU T4 #4022

yflv-yanxia opened this issue Jul 23, 2024 · 7 comments

Comments

@yflv-yanxia
Copy link

Description

Using version 10 of TensorRT's trtexec to convert an ONNX model to a TensorRT model, the results of the FP32 model are correct, but the results of the FP16 model are incorrect. I have set almost all layers to FP32 using
trtexec --precisionConstraints=obey --builderOptimizationLevel=5 --layerPrecisions="/Transpose":fp32,"/intro_/Conv":fp32,"/intro_down/Conv":fp32,.......,
but the results are still incorrect. Could you help me solve this problem?

Environment

TensorRT Version: TensorRT 10.0.1

NVIDIA GPU: Tesla T4

NVIDIA Driver Version: 450.36.06

CUDA Version: 11.0

CUDNN Version:8.0.0

Operating System:

onnx opset17

Relevant Files

onnx Model link: https://drive.google.com/file/d/14zuubyXVVN-mOJ2b64jPc128dj4VRU_C/view?usp=sharing

Steps To Reproduce

trtexec --onnx=$pr_nolog_model_path --fp16 --device=0 --minShapes=input:1x128x128x3 --optShapes=input:1x1920x1920x3 --maxShapes=input:1x3072x3072x3 --saveEngine=ysDeblur_cc75_t4_fp16_small_dyn.trtmodel --layerPrecisions="/Transpose":fp32,"/intro_/Conv":fp32,"/intro_down/Conv":fp32,(so many) --precisionConstraints=obey --builderOptimizationLevel=5

@yflv-yanxia yflv-yanxia changed the title XXX failure of TensorRT X.Y when running XXX on GPU XXX FP16 model of TensorRT 10.0 are incorrect when running on GPU T4 Jul 23, 2024
@lix19937
Copy link

lix19937 commented Jul 31, 2024

Try to use follow cmd

 polygraphy run xxxx.onnx --trt --onnxrt --fp16 \
     --trt-outputs mark all \
     --onnx-outputs mark all

@yflv-yanxia
Copy link
Author

log_netg.txt
Here are the results. @lix19937

@yflv-yanxia
Copy link
Author

Hi, sorry to bother you, but is there any update on the solution? @lix19937

@lix19937
Copy link

@yflv-yanxia Sorry late to reply, from my build log

[08/20/2024-20:57:50] [W] [TRT] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[08/20/2024-20:57:50] [W] [TRT] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[08/20/2024-20:57:50] [W] [TRT] Check verbose logs for the list of affected weights.
[08/20/2024-20:57:50] [W] [TRT] - 111 weights are affected by this issue: Detected subnormal FP16 values.
[08/20/2024-20:57:50] [V] [TRT]   List of affected weights: /decoders.0/decoders.0.0/conv1/Conv.weight, /decoders.0/decoders.0.0/conv2/Conv.weight, /decoders.0/decoders.0.0/conv3/Conv + decoders.0.0.beta + /decoders.0/decoders.0.0/Mul_1 + /decoders.0/decoders.0.0/Add.weight, /decoders.0/decoders.0.0/conv4/Conv.weight, /decoders.0/decoders.0.0/conv5/Conv + decoders.0.0.gamma + /decoders.0/decoders.0.0/Mul_2 + /decoders.0/decoders.0.0/Add_1.bias, /decoders.0/decoders.0.0/conv5/Conv + decoders.0.0.gamma + /decoders.0/decoders.0.0/Mul_2 + /decoders.0/decoders.0.0/Add_1.weight, /decoders.0/decoders.0.0/sca/sca.1/Conv.weight, /decoders.1/decoders.1.0/conv1/Conv.weight, /decoders.1/decoders.1.0/conv2/Conv.weight, /decoders.1/decoders.1.0/conv3/Conv + decoders.1.0.beta + /decoders.1/decoders.1.0/Mul_1 + /decoders.1/decoders.1.0/Add.weight, /decoders.1/decoders.1.0/conv4/Conv.weight, /decoders.1/decoders.1.0/conv5/Conv + decoders.1.0.gamma + /decoders.1/decoders.1.0/Mul_2 + /decoders.1/decoders.1.0/Add_1.weight, /decoders.1/decoders.1.0/sca/sca.1/Conv.weight, /decoders.2/decoders.2.0/conv2/Conv.weight, /decoders.2/decoders.2.0/conv3/Conv + decoders.2.0.beta + /decoders.2/decoders.2.0/Mul_1 + /decoders.2/decoders.2.0/Add.weight, /decoders.2/decoders.2.0/conv5/Conv + decoders.2.0.gamma + /decoders.2/decoders.2.0/Mul_2 + /decoders.2/decoders.2.0/Add_1.weight, /decoders.3/decoders.3.0/conv4/Conv.weight, /decoders.3/decoders.3.0/conv5/Conv + decoders.3.0.gamma + /decoders.3/decoders.3.0/Mul_2 + /decoders.3/decoders.3.0/Add_1.weight, /downs.0/Conv.weight, /downs.1/Conv.weight, /downs.2/Conv.weight, /downs.3/Conv.weight, /encoders.0/encoders.0.0/sca/sca.1/Conv.weight, /encoders.1/encoders.1.0/conv3/Conv + encoders.1.0.beta + /encoders.1/encoders.1.0/Mul_1 + /encoders.1/encoders.1.0/Add.weight, /encoders.2/encoders.2.0/conv1/Conv.weight, /encoders.2/encoders.2.0/conv2/Conv.bias, /encoders.2/encoders.2.0/conv3/Conv + encoders.2.0.beta + /encoders.2/encoders.2.0/Mul_1 + /encoders.2/encoders.2.0/Add.weight, /encoders.2/encoders.2.0/conv4/Conv.weight, /encoders.2/encoders.2.0/conv5/Conv + encoders.2.0.gamma + /encoders.2/encoders.2.0/Mul_2 + /encoders.2/encoders.2.0/Add_1.weight, /encoders.2/encoders.2.0/sca/sca.1/Conv.weight, /encoders.3/encoders.3.0/conv1/Conv.weight, /encoders.3/encoders.3.0/conv3/Conv + encoders.3.0.beta + /encoders.3/encoders.3.0/Mul_1 + /encoders.3/encoders.3.0/Add.weight, /encoders.3/encoders.3.0/conv4/Conv.weight, /encoders.3/encoders.3.0/conv5/Conv + encoders.3.0.gamma + /encoders.3/encoders.3.0/Mul_2 + /encoders.3/encoders.3.0/Add_1.bias, /encoders.3/encoders.3.0/conv5/Conv + encoders.3.0.gamma + /encoders.3/encoders.3.0/Mul_2 + /encoders.3/encoders.3.0/Add_1.weight, /encoders.3/encoders.3.0/sca/sca.1/Conv.weight, /encoders.3/encoders.3.1/conv1/Conv.weight, /encoders.3/encoders.3.1/conv2/Conv.weight, /encoders.3/encoders.3.1/conv3/Conv + encoders.3.1.beta + /encoders.3/encoders.3.1/Mul_1 + /encoders.3/encoders.3.1/Add.weight, /encoders.3/encoders.3.1/conv4/Conv.weight, /encoders.3/encoders.3.1/conv5/Conv + encoders.3.1.gamma + /encoders.3/encoders.3.1/Mul_2 + /encoders.3/encoders.3.1/Add_1.weight, /encoders.3/encoders.3.1/sca/sca.1/Conv.weight, /encoders.3/encoders.3.2/conv1/Conv.weight, /encoders.3/encoders.3.2/conv2/Conv.weight, /encoders.3/encoders.3.2/conv3/Conv + encoders.3.2.beta + /encoders.3/encoders.3.2/Mul_1 + /encoders.3/encoders.3.2/Add.weight, /encoders.3/encoders.3.2/conv4/Conv.weight, /encoders.3/encoders.3.2/conv5/Conv + encoders.3.2.gamma + /encoders.3/encoders.3.2/Mul_2 + /encoders.3/encoders.3.2/Add_1.weight, /encoders.3/encoders.3.2/sca/sca.1/Conv.weight, /encoders.3/encoders.3.3/conv1/Conv.bias, /encoders.3/encoders.3.3/conv1/Conv.weight, /encoders.3/encoders.3.3/conv2/Conv.weight, /encoders.3/encoders.3.3/conv3/Conv + encoders.3.3.beta + /encoders.3/encoders.3.3/Mul_1 + /encoders.3/encoders.3.3/Add.weight, /encoders.3/encoders.3.3/conv4/Conv.weight, /encoders.3/encoders.3.3/conv5/Conv + encoders.3.3.gamma + /encoders.3/encoders.3.3/Mul_2 + /encoders.3/encoders.3.3/Add_1.weight, /encoders.3/encoders.3.3/sca/sca.1/Conv.weight, /encoders.3/encoders.3.4/conv1/Conv.weight, /encoders.3/encoders.3.4/conv2/Conv.weight, /encoders.3/encoders.3.4/conv3/Conv + encoders.3.4.beta + /encoders.3/encoders.3.4/Mul_1 + /encoders.3/encoders.3.4/Add.weight, /encoders.3/encoders.3.4/conv4/Conv.weight, /encoders.3/encoders.3.4/conv5/Conv + encoders.3.4.gamma + /encoders.3/encoders.3.4/Mul_2 + /encoders.3/encoders.3.4/Add_1.weight, /encoders.3/encoders.3.4/sca/sca.1/Conv.weight, /encoders.3/encoders.3.5/conv1/Conv.weight, /encoders.3/encoders.3.5/conv2/Conv.weight, /encoders.3/encoders.3.5/conv3/Conv + encoders.3.5.beta + /encoders.3/encoders.3.5/Mul_1 + /encoders.3/encoders.3.5/Add.bias, /encoders.3/encoders.3.5/conv3/Conv + encoders.3.5.beta + /encoders.3/encoders.3.5/Mul_1 + /encoders.3/encoders.3.5/Add.weight, /encoders.3/encoders.3.5/conv4/Conv.bias, /encoders.3/encoders.3.5/conv4/Conv.weight, /encoders.3/encoders.3.5/conv5/Conv + encoders.3.5.gamma + /encoders.3/encoders.3.5/Mul_2 + /encoders.3/encoders.3.5/Add_1.weight, /encoders.3/encoders.3.5/sca/sca.1/Conv.weight, /encoders.3/encoders.3.6/conv1/Conv.weight, /encoders.3/encoders.3.6/conv2/Conv.weight, /encoders.3/encoders.3.6/conv3/Conv + encoders.3.6.beta + /encoders.3/encoders.3.6/Mul_1 + /encoders.3/encoders.3.6/Add.weight, /encoders.3/encoders.3.6/conv4/Conv.weight, /encoders.3/encoders.3.6/conv5/Conv + encoders.3.6.gamma + /encoders.3/encoders.3.6/Mul_2 + /encoders.3/encoders.3.6/Add_1.weight, /encoders.3/encoders.3.6/sca/sca.1/Conv.weight, /encoders.3/encoders.3.7/conv1/Conv.weight, /encoders.3/encoders.3.7/conv3/Conv + encoders.3.7.beta + /encoders.3/encoders.3.7/Mul_1 + /encoders.3/encoders.3.7/Add.weight, /encoders.3/encoders.3.7/conv4/Conv.weight, /encoders.3/encoders.3.7/conv5/Conv + encoders.3.7.gamma + /encoders.3/encoders.3.7/Mul_2 + /encoders.3/encoders.3.7/Add_1.weight, /encoders.3/encoders.3.7/sca/sca.1/Conv.weight, /encoders.3/encoders.3.8/conv1/Conv.weight, /encoders.3/encoders.3.8/conv3/Conv + encoders.3.8.beta + /encoders.3/encoders.3.8/Mul_1 + /encoders.3/encoders.3.8/Add.weight, /encoders.3/encoders.3.8/conv4/Conv.weight, /encoders.3/encoders.3.8/conv5/Conv + encoders.3.8.gamma + /encoders.3/encoders.3.8/Mul_2 + /encoders.3/encoders.3.8/Add_1.weight, /encoders.3/encoders.3.8/sca/sca.1/Conv.weight, /encoders.3/encoders.3.9/conv1/Conv.weight, /encoders.3/encoders.3.9/conv3/Conv + encoders.3.9.beta + /encoders.3/encoders.3.9/Mul_1 + /encoders.3/encoders.3.9/Add.bias, /encoders.3/encoders.3.9/conv3/Conv + encoders.3.9.beta + /encoders.3/encoders.3.9/Mul_1 + /encoders.3/encoders.3.9/Add.weight, /encoders.3/encoders.3.9/conv4/Conv.weight, /encoders.3/encoders.3.9/conv5/Conv + encoders.3.9.gamma + /encoders.3/encoders.3.9/Mul_2 + /encoders.3/encoders.3.9/Add_1.weight, /encoders.3/encoders.3.9/sca/sca.1/Conv.weight, /ending_/Conv.weight, /ending_up/ending_up.0/Conv.weight, /middle_blks/middle_blks.0/conv1/Conv.weight, /middle_blks/middle_blks.0/conv2/Conv.weight, /middle_blks/middle_blks.0/conv3/Conv + middle_blks.0.beta + /middle_blks/middle_blks.0/Mul_1 + /middle_blks/middle_blks.0/Add.bias, /middle_blks/middle_blks.0/conv3/Conv + middle_blks.0.beta + /middle_blks/middle_blks.0/Mul_1 + /middle_blks/middle_blks.0/Add.weight, /middle_blks/middle_blks.0/conv4/Conv.weight, /middle_blks/middle_blks.0/conv5/Conv + middle_blks.0.gamma + /middle_blks/middle_blks.0/Mul_2 + /middle_blks/middle_blks.0/Add_1.bias, /middle_blks/middle_blks.0/conv5/Conv + middle_blks.0.gamma + /middle_blks/middle_blks.0/Mul_2 + /middle_blks/middle_blks.0/Add_1.weight, /middle_blks/middle_blks.0/sca/sca.1/Conv.bias, /middle_blks/middle_blks.0/sca/sca.1/Conv.weight, /ups.0/ups.0.0/Conv.weight, /ups.1/ups.1.0/Conv.weight, /ups.2/ups.2.0/Conv.weight, /ups.3/ups.3.0/Conv.weight, encoders.3.5.norm1.weight + /encoders.3/encoders.3.5/norm1/Mul + encoders.3.5.norm1.bias + /encoders.3/encoders.3.5/norm1/Add_1.shift, encoders.3.8.norm1.weight + /encoders.3/encoders.3.8/norm1/Mul + encoders.3.8.norm1.bias + /encoders.3/encoders.3.8/norm1/Add_1.shift, middle_blks.0.norm1.weight + /middle_blks/middle_blks.0/norm1/Mul + middle_blks.0.norm1.bias + /middle_blks/middle_blks.0/norm1/Add_1.scale, middle_blks.0.norm1.weight + /middle_blks/middle_blks.0/norm1/Mul + middle_blks.0.norm1.bias + /middle_blks/middle_blks.0/norm1/Add_1.shift, middle_blks.0.norm2.weight + /middle_blks/middle_blks.0/norm2/Mul + middle_blks.0.norm2.bias + /middle_blks/middle_blks.0/norm2/Add_1.shift
[08/20/2024-20:57:50] [W] [TRT] - 4 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.

You can check your conv is adjacent to bn op afterwards or not.

@YouSenRong
Copy link

I also encountered this problem, that I had set the layer_precision and layer_output_type to kFLOAT of all layers can be set under FP16 mode, but some inference results are still wrong(the results are all one).
Finally, I found that it may result from the reformation of input. Under FP16 mode, the input are firstly reformatted into FP16, although the following layer works in kFLOAT precision and the output type is kFLOAT.
Here is part of engine graph of our models
Image

I wonder is there any way to disable inserting reformation( to FP16) layer under FP16 mode? Thanks! @lix19937

@lix19937
Copy link

If you can make your input data type are/is fp16 (preprocess phase, the img data from fp32 to fp16),
then use trtexec --inputIOFormats=fp16:chw,fp16:chw,fp16:chw,fp16:chw --outputIOFormats=fp16:chw in postprocess phase, fp16 out

@YouSenRong
Copy link

@lix19937 Thanks for your reply.
Finally, I found that the problem is due to the overflow in FP16 of some inputs. After we fixed the overflow, the accuracy of result is within expectations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants