Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

❓ [Question] Speed problem about TRTorch and Torch-TensorRT - Device Compatibility Check #854

Closed
yuezhuang1387 opened this issue Feb 8, 2022 · 8 comments
Assignees

Comments

@yuezhuang1387
Copy link

Question1

I find the Torchscript model optimized by TRTorch 0.2.0 faster than TensorRT model(All models are Python API),such as common ResNet series, RapVGG series models and so on, shouldn't the TensorRT model be the fastest? I want to know why does this happen.

Torchscript model(optimized by TRTorch 0.2.0):

  • torch 1.7.1+cu110
  • trtorch 0.2.0
  • TensorRT 7.2
  • cuDNN 8.2
  • GPU:Tesla T4
  • CentOS Linux release 7.6.1810 (Core)

TensorRT model(.trt):

  • torch 1.7.1+cu110
  • tensorrt 8.2.0.6
  • cuDNN 8.2
  • GPU:Tesla T4
    -CentOS Linux release 7.6.1810 (Core)

Question2

I found the inference speed of TorchScript model is different after using different versions of Torch-TensoRT(TRTorch) to optimize with the same structure.
For the same structure ResNet series model, TorchScript model(optimized by TRTorch 0.2.0,torch 1.7.1-cu110,TensorRT 7.2 and cuDNN 8.2) is faster than TorchScript model(optimized by Torch-TensorRT 1.0.0,torch 1.10.1-cu113,TensorRT 8.0 and cuDNN 8.2), shouldn't the latest Torch-TensorRT 1.0.0 be faster ?I'm also very confused.

  • GPU:Tesla T4
  • CentOS Linux release 7.6.1810 (Core)
  • input shape: (1,3,224,224)
    Here are some of my test results.
    image
@yuezhuang1387 yuezhuang1387 added the question Further information is requested label Feb 8, 2022
@ncomly-nvidia
Copy link
Contributor

Between these two versions there was a constant time operation that was added to check compatibility of the current device with the compiled model. This is likely the overhead you are experiencing.

We are investigating if this can be mitigated for future inferences once the model is loaded.

@github-actions
Copy link

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

@ncomly-nvidia ncomly-nvidia added feature request New feature or request and removed question Further information is requested No Activity labels Aug 22, 2022
@ncomly-nvidia ncomly-nvidia changed the title ❓ [Question] Speed problem about TRTorch and Torch-TensorRT ❓ [Question] Speed problem about TRTorch and Torch-TensorRT - Device Compatibility Check Aug 22, 2022
@github-actions
Copy link

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

@Christina-Young-NVIDIA
Copy link
Collaborator

Christina-Young-NVIDIA commented Dec 20, 2022

[Removed]

@ncomly-nvidia
Copy link
Contributor

ncomly-nvidia commented Jan 3, 2023

This device check cannot currently be mitigated safely. We are investigating options in TRT to reduce this overhead.

@narendasan
Copy link
Collaborator

Explore using https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaPointerAttributes.html#structcudaPointerAttributes to query data locations and assume current device is correct?

@Christina-Young-NVIDIA Christina-Young-NVIDIA closed this as not planned Won't fix, can't repro, duplicate, stale Jun 12, 2023
@laikhtewari
Copy link
Collaborator

laikhtewari commented Jun 20, 2023

https://developer.nvidia.com/blog/cuda-pro-tip-the-fast-way-to-query-device-properties/

This check was added that likely caused perf issue:

(cudaGetDevice(reinterpret_cast<int*>(&device)) == cudaSuccess),

This check invokes a constructor for a TensorRT wrapper object RTDevice::RTDevice

RTDevice::RTDevice(int64_t gpu_id, nvinfer1::DeviceType device_type) {

And this is invoking cudaGetDeviceProperties which is expensive, but the above article may be used to mitigate the issue.

@github-actions
Copy link

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants