All useful sample codes of TensorRT models using ONNX
- RTX3060 (notebook)
- WSL
- Ubuntu 22.04.5 LTS
- cuda 12.8
conda deactivate conda env remove -n trte -y
conda create -n trte python=3.11 --yes
conda activate trte
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu129
pip install cuda-python==12.9.2
pip install tensorrt-cu12
pip install onnx
pip install opencv-python
pip install timm
pip install matplotlib
pip install -U "nvidia-modelopt[all]"
# Check installation
python -c "import modelopt; print(modelopt.__version__)"
python -c "import modelopt.torch.quantization.extensions as ext; ext.precompile()"
-
Generation TensorRT Model by using ONNX
1.1 TensorRT CPP API
1.2 TensorRT Python API
1.3 Polygraphy -
Dynamic shapes for TensorRT
2.1 Dynamic batch
2.2 Dynamic input size -
Custom Plugin
3.1 Adding a pre-processing layer by cuda -
Modifying an ONNX graph by ONNX GraphSurgeon
4.1 Extracting a feature map of the last Conv for Grad-Cam
4.2 Generating a TensorRT model with a custom plugin and ONNX
- Base model train & convert
1.1 Train Base Model (resnet18)
1.2 Base TensorRT (fp16) - Quantization
2.1 Explict Quantization (PTQ)
2.2 Explict Quantization (QAT)
2.3 Explict Quantization (ONNX PTQ)
2.4 Implicit Quantization (TensorRT PTQ) - Sparsity
3.1 Sparsity (2:4 sparsity) - Pruning
4.1 Pruning - NAS
5.1 NAS(work in progress...) - Multiple Optimizations
6.1 (Pruning + Sparsity)
6.2 (Pruning + Sparsity + Quantization(QAT))
Framework | PyTorch | TensorRT | TensorRT | TensorRT | TensorRT | TensorRT | TensorRT | TensorRT |
---|---|---|---|---|---|---|---|---|
Opti Technique | - | - | trt ptq (Implicit) | onnx ptq (Explict) | tmo ptq (Explict) | tmo qat (Explict) | tmo sparsity | tmo pruning (flops 80%) |
Precision | fp16 | fp16 | int8 | int8 | int8 | int8 | fp16 | fp16 |
Top-1 Acc [%] | 84.58 | 84.54 | 84.34 | 84.5 | 84.2 | 84.42 | 83.28 | 82.76 |
Top-5 Acc [%] | 97.2 | 97.2 | 97.1 | 97 | 97.06 | 97.1 | 96.72 | 96.42 |
FPS [Frame/sec] | 406.27 | 1463.45 | 1915.04 | 1897.46 | 1542.34 | 1572.81 | 1483.85 | 1573.2 |
Avg Latency [ms] | 2.46 | 0.68 | 0.52 | 0.53 | 0.65 | 0.64 | 0.67 | 0.64 |
GPU Mem [MB] | 286 | 138 | 124 | 124 | 124 | 138 | 138 | 130 |
- Super Resolution
1.1 Real-ESRGAN - Object Detection
2.1 yolo11 - Instance Segmentation
- Semantic Segmentation
4.1 U-2-Net(Sky Segmentation) 4.2 BEN2(Background Erase Network) 4.3 MODNet 4.4 ormbg(Open Remove Background Model) - Depth Estimation
5.1 Depth Pro