You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The design to use containers for all benchmarking was due to its promise of including all required dependencies and will not force the user to worry about versions and stuff.
For TensorRT to function you need the compatible versions of CUDA, CUDNN and the driver as stated here in this matrix
The official Nvidia TensortRT container we use comes packaged with the right version of CUDA and CUDNN, great!
But since the driver is a kernel mode component, that cannot come with the container, rather should have been installed on the host system.
So far all of the systems we had tested this feature on happened to have the correct drivers.
Except for the T4 system Jermey used and the T4 system I found on GCP. Once I updated the driver version everything worked as expected.
Ideally, the TRT container should report this error instead of just crashing. The fix on our end should be to read the driver version and report a proper error to update the driver. I will add this to the issue.
Trying to run any GPU benchmark on head of main on GCP or Azure yields an error like this:
GPU benchmarking is known to work correctly on commit e250ac7
The text was updated successfully, but these errors were encountered: