These scripts help with conversion and execution of Google EfficientNet V1 and EfficientNet V2 models with NVIDIA TensorRT.
August 2023:
- Update ONNX version support to 1.14.0
- Removed support for Python versions < 3.8.
Note: The sample is not compatible with Python-3.12 because tensorflow-addons does not support Python-3.12.
For best results, we recommend running these scripts on an environment with TensorRT >= 8.0.1 and TensorFlow 2.12.0.
Install TensorRT as per the TensorRT Install Guide. You will need to make sure the Python bindings for TensorRT are also installed correctly, these are available by installing the python3-libnvinfer
and python3-libnvinfer-dev
packages on your TensorRT download.
Make sure all other packages listed in requirements.txt
:
pip3 install -r requirements.txt
You will also need the latest onnx_graphsurgeon
python module. If not already installed by TensorRT, you can install it manually by running:
pip3 install onnx-graphsurgeon --index-url https://pypi.ngc.nvidia.com
The workflow to convert an EfficientNet model is basically TensorFlow → ONNX → TensorRT, and so parts of this process require TensorFlow to be installed. If you are performing this conversion to run inference on the edge, such as for NVIDIA Jetson devices, it might be easier to do the ONNX conversion on a PC first.
The starting point of conversion is a TensorFlow saved model. This can be exported from your own trained models, or you can download a pre-trained model. This conversion script is compatible with two types of models:
- EfficientNet V1 models trained with the TensorFlow TPU Models framework.
- EfficientNet V2 models trained with the AutoML framework.
You can download one of the pre-trained saved models from the EfficientNet TFHub, such as:
wget https://storage.googleapis.com/tfhub-modules/tensorflow/efficientnet/b0/classification/1.tar.gz
The contents of this package, when extracted, will hold a saved model ready for conversion.
Alternatively, if you are training your own model, or if you need to re-export the saved model manually, you will need the training checkpoint (or a pre-trained "ckpt" from the EfficientNet Repository such as this).
To export a saved model from the checkpoint, clone and install the TensorFlow TPU Models repository, and run:
cd /path/to/tpu/models/official/efficientnet
python3 export_model.py \
--ckpt_dir /path/to/efficientnet-b0 \
--image_size 224 \
--model_name efficientnet-b0 \
--output_tflite /dev/null \
--noquantize \
--output_saved_model_dir /path/to/saved_model
Adapt --image_size
and --model_name
according to the checkpoint model being used. The --ckpt_dir
argument points to the directory holding the checkpoint as described above. The TF saved model will be exported to the path given by --output_saved_model_dir
.
At the time of this writing, there exist no EfficientNet V2 saved models in TFHub yet. So you will need to download a pre-trained checkpoint, or use your own trained model of course.
To do so, you will need your training checkpoint (or a pre-trained "ckpt" from the EfficientNet V2 Repository such as this):
wget https://storage.googleapis.com/cloud-tpu-checkpoints/efficientnet/v2/efficientnetv2-s.tgz
To export a saved model from here, clone and install the AutoML repository, and run:
cd /path/to/automl/efficientnetv2
python3 infer.py \
--mode tf2bm \
--model_name efficientnetv2-s \
--model_dir ../../efficientnetv2-s/ \
--export_dir ../../efficientnetv2-s/saved_model
Where you should adapt --model_name
to the corresponding model for the checkpoint used. The --model_dir
argument should point to the downloaded or trained checkpoint as described above. The exported saved model will then be available in the directory pointed by the --export_dir
argument.
To generate an ONNX model file, find the saved model as described above, select a batch size and input size, and run:
python3 create_onnx.py \
--saved_model /path/to/saved_model \
--onnx /path/to/model.onnx \
--batch_size 1 \
--input_size 384
You may need to adapt the argument --input_size
to explicitly define the exact input image dimensions to use in the graph. Consult the model definitions in the corresponding training system, to find the expected input size for the model you are working with.
This will create the file model.onnx
which is ready to convert to TensorRT.
Optionally, you may wish to visualize the resulting ONNX graph with a tool such as Netron.
It is possible to build the TensorRT engine directly with trtexec
using the ONNX graph generated in the previous step. However, the script build_engine.py
is provided for convenience, as it has been tailored to EfficientNet engine building and calibration. Run python3 build_engine.py --help
for details on available settings.
To build the TensorRT engine file with FP16 precision, run:
python3 build_engine.py \
--onnx /path/to/model.onnx \
--engine /path/to/engine.trt \
--precision fp16
The file engine.trt
will be created, which can now be used to infer with TensorRT.
For best results, make sure no other processes are using the GPU during engine build, as it may affect the optimal tactic selection process.
To build and calibrate an engine for INT8 precision, run:
python3 build_engine.py \
--onnx /path/to/model.onnx \
--engine /path/to/engine.trt \
--precision int8 \
--calib_input /path/to/calibration/images \
--calib_cache /path/to/calibration.cache \
--calib_preprocessor V2
Where --calib_input
points to a directory with several thousands of images. For example, this could be a subset of the training or validation datasets that were used for the model. It's important that this data represents the runtime data distribution relatively well, therefore, the more images that are used for calibration, the better accuracy that will be achieved in INT8 precision. For ImageNet networks, we have found that 25,000 images gives a good result.
The --calib_cache
argument controls where the calibration cache file will be written to. This is useful to keep a cached copy of the calibration results. Next time you need to build the engine for the same network, if this file exists, it will skip the calibration step and use the cached values instead.
Finally, the --calib_preprocessor
option sets the preprocessing algorithm to apply on calibration images. Please refer to the Input Preprocessing section below for more details.
Run python3 build_engine.py --help
for additional build options.
Optionally, you can obtain execution timing information for the built engine by using the trtexec
utility, as:
trtexec \
--loadEngine=/path/to/engine.trt \
--useCudaGraph --noDataTransfers \
--iterations=100 --avgRuns=100
If it's not already in your $PATH
, the trtexec
binary is usually found in /usr/src/tensorrt/bin/trtexec
, depending on your TensorRT installation method.
An inference benchmark will run, with GPU Compute latency times printed out to the console. Depending on the version of TensorRT, you should see something similar to:
GPU Compute Time: min = 1.79895 ms, max = 1.9209 ms, mean = 1.80589 ms, median = 1.80493 ms, percentile(99%) = 1.81396 ms
For optimal performance, inference should be done in a C++ application that takes advantage of CUDA Graphs to launch the inference request. Alternatively, the TensorRT engine built with this process can also be executed through either Triton Inference Server or DeepStream SDK.
However, for convenience, a python inference script is provided here for quick testing of the built TensorRT engine.
An important concept for computer vision models is the preprocessing applied to an image before feeding it to the classifier network. The various EfficientNet models supported by this converter use different preprocessing algorithms.
We have implemented three different preprocessor algorithms, as defined in image_batcher.py
. They are:
Preprocessing | Resizing | Normalization | Mean Subtract |
---|---|---|---|
V2 | Bilinear Resize | [-1 to +1] Range | No |
V1 | Bicubic Resize + PadCrop | [0 to +1] Range | No |
V1MS | Bicubic Resize + PadCrop | [0 to +1] Range | Yes |
V2: This is the preprocessor to be used with all EfficientNet V2 models. EfficientNet V2 does not require mean subtraction, so it is never performed for these models.
V1: This is the default preprocessor to be used with most EfficientNet V1 models. EfficientNet V1 normally expects mean subtraction to be applied. However, some TensorFlow saved models, such as those downloaded from TFHub, already perform this operation within the graph itself, so it is not required to do it during preprocessing.
V1MS: Depending on the saved model exporter, some EfficientNet V1 models may not have the integrated mean subtraction. This is often the case with models exported from the pre-trained checkpoints. For those cases, this preprocessor will apply mean subtraction during preprocessing.
These are the supported values for --preprocessor
and --calib_preprocessor
arguments used throughout these scripts. Note that choosing an incorrect preprocessor for a model will considerably impact its accuracy. Please take a moment to choose the correct preprocessor to use before performing inference or validation of a model.
To classify a set of images with TensorRT, run:
python3 infer.py \
--engine /path/to/engine.trt \
--input /path/to/images \
--preprocessor V2
Where the input path can be either a single image file, or a directory of jpg/png/bmp images. The classification results will be printed out to the console, one image per line, as:
<image path> <predicted class id> <confidence score>
You can also redirect these results to a file, and optionally set a separator character (such as for CSV file creation):
python3 infer.py \
--engine /path/to/engine.trt \
--input /path/to/ILSVRC2012_img_val \
--preprocessor V2 \
--separator ',' > results.csv
To validate the TensorRT inference results accuracy against ground truth labels, run:
python3 eval_gt.py \
--engine /path/to/engine.trt \
--annotations /path/to/val.txt \
--input /path/to/images \
--preprocessor V2
The annotations file is expected to have one line per image, where the first column is the image filename, and the second column is the ground truth class label. For example:
ILSVRC2012_val_00000001.JPEG 65
ILSVRC2012_val_00000002.JPEG 970
ILSVRC2012_val_00000003.JPEG 230
ILSVRC2012_val_00000004.JPEG 809
[...]
NOTE: The ImageNet pre-trained models follow the label mapping introduced by Caffe, which indexes labels according to their synset number. The validation file for this format can be downloaded from Caffe's ILSVRC2012 auxiliary package at:
http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz
You can use the
val.txt
file bundled in this package for ImageNet evaluation purposes.
Upon a successful run of EfficientNet V2-S
on the ILSVRC2012_img_val
ImageNet dataset, for example, you should see something like:
Top-1 Accuracy: 83.710%
Top-5 Accuracy: 96.615%
Another method to validate the engine is to compare the TensorRT inference results with what TensorFlow produces, to make sure both frameworks give similar results. For this, run:
python3 compare_tf.py \
--engine /path/to/engine.trt \
--saved_model /path/to/saved_model \
--input /path/to/images \
--preprocessor V2
This can be performed on any set of images, no ground truth is required. The script executes both the TensorFlow saved model and the TensorRT engine simultaneously on the given input images. It then computes the class prediction similarity and RMSE in confidence scores between both outputs.
Upon a successful run, you should see something like:
Matching Top-1 class predictions for 4999 out of 5000 images: 99.98%
RMSE between TensorFlow and TensorRT confidence scores: 0.006