❓ [Question] Speed problem about TRTorch and Torch-TensorRT - Device Compatibility Check #854

yuezhuang1387 · 2022-02-08T06:03:31Z

Question1

I find the Torchscript model optimized by TRTorch 0.2.0 faster than TensorRT model(All models are Python API)，such as common ResNet series, RapVGG series models and so on, shouldn't the TensorRT model be the fastest? I want to know why does this happen.

Torchscript model(optimized by TRTorch 0.2.0):

torch 1.7.1+cu110
trtorch 0.2.0
TensorRT 7.2
cuDNN 8.2
GPU:Tesla T4
CentOS Linux release 7.6.1810 (Core)

TensorRT model(.trt):

torch 1.7.1+cu110
tensorrt 8.2.0.6
cuDNN 8.2
GPU:Tesla T4
-CentOS Linux release 7.6.1810 (Core)

Question2

I found the inference speed of TorchScript model is different after using different versions of Torch-TensoRT(TRTorch) to optimize with the same structure.
For the same structure ResNet series model, TorchScript model(optimized by TRTorch 0.2.0,torch 1.7.1-cu110,TensorRT 7.2 and cuDNN 8.2) is faster than TorchScript model(optimized by Torch-TensorRT 1.0.0,torch 1.10.1-cu113,TensorRT 8.0 and cuDNN 8.2), shouldn't the latest Torch-TensorRT 1.0.0 be faster ？I'm also very confused.

GPU:Tesla T4
CentOS Linux release 7.6.1810 (Core)
input shape: (1,3,224,224)
Here are some of my test results.

ncomly-nvidia · 2022-02-22T16:56:39Z

Between these two versions there was a constant time operation that was added to check compatibility of the current device with the compiled model. This is likely the overhead you are experiencing.

We are investigating if this can be mitigated for future inferences once the model is loaded.

github-actions · 2022-08-17T00:02:30Z

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

github-actions · 2022-11-21T00:03:13Z

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Christina-Young-NVIDIA · 2022-12-20T02:05:03Z

[Removed]

ncomly-nvidia · 2023-01-03T21:46:30Z

This device check cannot currently be mitigated safely. We are investigating options in TRT to reduce this overhead.

narendasan · 2023-03-23T07:02:07Z

Explore using https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaPointerAttributes.html#structcudaPointerAttributes to query data locations and assume current device is correct?

laikhtewari · 2023-06-20T22:50:22Z

https://developer.nvidia.com/blog/cuda-pro-tip-the-fast-way-to-query-device-properties/

This check was added that likely caused perf issue:

TensorRT/core/runtime/runtime.cpp

Line 81 in bf4474d

(cudaGetDevice(reinterpret_cast<int*>(&device)) == cudaSuccess),

This check invokes a constructor for a TensorRT wrapper object RTDevice::RTDevice

TensorRT/core/runtime/RTDevice.cpp

Line 16 in bf4474d

RTDevice::RTDevice(int64_t gpu_id, nvinfer1::DeviceType device_type) {

And this is invoking cudaGetDeviceProperties which is expensive, but the above article may be used to mitigate the issue.

github-actions · 2023-09-19T00:02:28Z

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

yuezhuang1387 added the question Further information is requested label Feb 8, 2022

narendasan assigned andi4191 May 18, 2022

narendasan added the component: runtime label May 18, 2022

github-actions bot added the No Activity label Aug 17, 2022

ncomly-nvidia added feature request New feature or request and removed question Further information is requested No Activity labels Aug 22, 2022

ncomly-nvidia changed the title ~~❓ [Question] Speed problem about TRTorch and Torch-TensorRT~~ ❓ [Question] Speed problem about TRTorch and Torch-TensorRT - Device Compatibility Check Aug 22, 2022

github-actions bot assigned narendasan Aug 22, 2022

github-actions bot added the No Activity label Nov 21, 2022

ncomly-nvidia removed the No Activity label Nov 21, 2022

ncomly-nvidia unassigned andi4191 Nov 21, 2022

Christina-Young-NVIDIA assigned ncomly-nvidia and unassigned narendasan Dec 20, 2022

Christina-Young-NVIDIA closed this as not planned Won't fix, can't repro, duplicate, stale Jun 12, 2023

laikhtewari reopened this Jun 20, 2023

laikhtewari assigned narendasan and unassigned ncomly-nvidia Jun 20, 2023

github-actions bot added the No Activity label Sep 19, 2023

github-actions bot closed this as completed Sep 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

❓ [Question] Speed problem about TRTorch and Torch-TensorRT - Device Compatibility Check #854

❓ [Question] Speed problem about TRTorch and Torch-TensorRT - Device Compatibility Check #854

yuezhuang1387 commented Feb 8, 2022

ncomly-nvidia commented Feb 22, 2022

github-actions bot commented Aug 17, 2022

github-actions bot commented Nov 21, 2022

Christina-Young-NVIDIA commented Dec 20, 2022 •

edited by ncomly-nvidia

Loading

ncomly-nvidia commented Jan 3, 2023 •

edited

Loading

narendasan commented Mar 23, 2023

laikhtewari commented Jun 20, 2023 •

edited

Loading

github-actions bot commented Sep 19, 2023

❓ [Question] Speed problem about TRTorch and Torch-TensorRT - Device Compatibility Check #854

❓ [Question] Speed problem about TRTorch and Torch-TensorRT - Device Compatibility Check #854

Comments

yuezhuang1387 commented Feb 8, 2022

Question1

Question2

ncomly-nvidia commented Feb 22, 2022

github-actions bot commented Aug 17, 2022

github-actions bot commented Nov 21, 2022

Christina-Young-NVIDIA commented Dec 20, 2022 • edited by ncomly-nvidia Loading

ncomly-nvidia commented Jan 3, 2023 • edited Loading

narendasan commented Mar 23, 2023

laikhtewari commented Jun 20, 2023 • edited Loading

https://developer.nvidia.com/blog/cuda-pro-tip-the-fast-way-to-query-device-properties/

github-actions bot commented Sep 19, 2023

Christina-Young-NVIDIA commented Dec 20, 2022 •

edited by ncomly-nvidia

Loading

ncomly-nvidia commented Jan 3, 2023 •

edited

Loading

laikhtewari commented Jun 20, 2023 •

edited

Loading