-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
❓ [Question] Speed problem about TRTorch and Torch-TensorRT - Device Compatibility Check #854
Comments
Between these two versions there was a constant time operation that was added to check compatibility of the current device with the compiled model. This is likely the overhead you are experiencing. We are investigating if this can be mitigated for future inferences once the model is loaded. |
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days |
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days |
[Removed] |
This device check cannot currently be mitigated safely. We are investigating options in TRT to reduce this overhead. |
Explore using https://docs.nvidia.com/cuda/cuda-runtime-api/structcudaPointerAttributes.html#structcudaPointerAttributes to query data locations and assume current device is correct? |
https://developer.nvidia.com/blog/cuda-pro-tip-the-fast-way-to-query-device-properties/This check was added that likely caused perf issue: TensorRT/core/runtime/runtime.cpp Line 81 in bf4474d
This check invokes a constructor for a TensorRT wrapper object RTDevice::RTDevice TensorRT/core/runtime/RTDevice.cpp Line 16 in bf4474d
And this is invoking cudaGetDeviceProperties which is expensive, but the above article may be used to mitigate the issue. |
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days |
Question1
I find the Torchscript model optimized by TRTorch 0.2.0 faster than TensorRT model(All models are Python API),such as common ResNet series, RapVGG series models and so on, shouldn't the TensorRT model be the fastest? I want to know why does this happen.
-CentOS Linux release 7.6.1810 (Core)
Question2
I found the inference speed of TorchScript model is different after using different versions of Torch-TensoRT(TRTorch) to optimize with the same structure.
For the same structure ResNet series model, TorchScript model(optimized by TRTorch 0.2.0,torch 1.7.1-cu110,TensorRT 7.2 and cuDNN 8.2) is faster than TorchScript model(optimized by Torch-TensorRT 1.0.0,torch 1.10.1-cu113,TensorRT 8.0 and cuDNN 8.2), shouldn't the latest Torch-TensorRT 1.0.0 be faster ?I'm also very confused.
Here are some of my test results.
The text was updated successfully, but these errors were encountered: