TensorRT OSS Release Changelog

10.6.0 GA - 2024-11-05

Key Feature and Updates:

Demo Changes
- demoBERT: The use of fcPlugin in demoBERT has been removed.
- demoBERT: All TensorRT plugins now used in demoBERT (CustomEmbLayerNormDynamic, CustomSkipLayerNormDynamic, and CustomQKVToContextDynamic) now have versions that inherit from IPluginV3 interface classes. The user can opt-in to use these V3 plugins by specifying --use-v3-plugins to the builder scripts.
  - Opting-in to use V3 plugins does not affect performance, I/O, or plugin attributes.
  - There is a known issue in the V3 (version 4) of CustomQKVToContextDynamic plugin from TensorRT 10.6.0, causing an internal assertion error if either the batch or sequence dimensions differ at runtime from the ones used to serialize the engine. See the “known issues” section of the TensorRT-10.6.0 release notes.
  - For smoother migration, the default behavior is still using the deprecated IPluginV2DynamicExt-derived plugins, when the flag: --use-v3-plugins isn't specified in the builder scripts. The flag --use-deprecated-plugins was added as an explicit way to enforce the default behavior, and is mutually exclusive with --use-v3-plugins.
- demoDiffusion
  - Introduced BF16 and FP8 support for the Flux.1-dev pipeline.
  - Expanded FP8 support on Ada platforms.
  - Enabled LoRA adapter compatibility for SDv1.5, SDv2.1, and SDXL pipelines using Diffusers version 0.30.3.
Sample Changes
- Added the Python sample quickly_deployable_plugins, which demonstrates quickly deployable Python-based plugin definitions (QDPs) in TensorRT. QDPs are a simple and intuitive decorator-based approach to defining TensorRT plugins, requiring drastically less code.
Plugin Changes
- The fcPlugin has been deprecated. Its functionality has been superseded by the IMatrixMultiplyLayer that is natively provided by TensorRT.
- Migrated IPluginV2-descendent version 1 of CustomEmbLayerNormDynamic, to version 6, which implements IPluginV3.
  - The newer versions preserve the attributes and I/O of the corresponding older plugin version.
  - The older plugin versions are deprecated and will be removed in a future release.
Parser Changes
- Updated ONNX submodule version to 1.17.0.
- Fixed issue where conditional layers were incorrectly being added.
- Updated local function metadata to contain more information.
- Added support for parsing nodes with Quickly Deployable Plugins.
- Fixed handling of optional outputs.
Tool Updates
- ONNX-Graphsurgeon updated to version 0.5.3
- Polygraphy updated to 0.49.14.

10.5.0 GA - 2024-09-30

Key Features and Updates:

Demo changes
- Added Flux.1-dev pipeline
Sample changes
- None
Plugin changes
- Migrated IPluginV2-descendent versions of bertQKVToContextPlugin (1, 2, 3) to newer versions (4, 5, 6 respectively) which implement IPluginV3.
- Note:
  - The newer versions preserve the attributes and I/O of the corresponding older plugin version
  - The older plugin versions are deprecated and will be removed in a future release
Quickstart guide
- None
Parser changes
- Added support for real-valued STFT operations
- Improved error handling in IParser

Known issues:

Demos:
- TensorRT engine might not be build successfully when using --fp8 flag on H100 GPUs.

10.4.0 GA - 2024-09-18

Key Features and Updates:

Demo changes
- Added Stable Cascade pipeline.
- Enabled INT8 and FP8 quantization for Stable Diffusion v1.5, v2.0 and v2.1 pipelines.
- Enabled FP8 quantization for Stable Diffusion XL pipeline.
Sample changes
- Add a new python sample aliased_io_plugin which demonstrates how in-place updates to plugin inputs can be achieved through I/O aliasing.
Plugin changes
- Migrated IPluginV2-descendent versions (a) of the following plugins to newer versions (b) which implement IPluginV3 (a->b):
  - scatterElementsPlugin (1->2)
  - skipLayerNormPlugin (1->5, 2->6, 3->7, 4->8)
  - embLayerNormPlugin (2->4, 3->5)
  - bertQKVToContextPlugin (1->4, 2->5, 3->6)
- Note
  - The newer versions preserve the corresponding attributes and I/O of the corresponding older plugin version.
  - The older plugin versions are deprecated and will be removed in a future release.
Quickstart guide
- Updated deploy_to_triton guide and removed legacy APIs.
- Removed legacy TF-TRT code as the project is no longer supported.
- Removed quantization_tutorial as pytorch_quantization has been deprecated. Check out https://github.com/NVIDIA/TensorRT-Model-Optimizer for the latest quantization support. Check Stable Diffusion XL (Base/Turbo) and Stable Diffusion 1.5 Quantization with Model Optimizer for integration with TensorRT.
Parser changes
- Added support for tensor axes for Pad operations.
- Added support for BlackmanWindow, HammingWindow, and HannWindow operations.
- Improved error handling in IParserRefitter.
- Fixed kernel shape inference in multi-input convolutions.
Updated tooling
- polygraphy-extension-trtexec v0.0.9

10.3.0 GA - 2024-08-02

Key Features and Updates:

Demo changes
- Added Stable Video Diffusion(SVD) pipeline.
Plugin changes
- Deprecated Version 1 of ScatterElements plugin. It is superseded by Version 2, which implements the IPluginV3 interface.
Quickstart guide
- Updated the SemanticSegmentation guide with latest APIs.
Parser changes
- Added support for tensor axes inputs for Slice node.
- Updated ScatterElements importer to use Version 2 of ScatterElements plugin, which implements the IPluginV3 interface.
Updated tooling
- Polygraphy v0.49.13

10.2.0 GA - 2024-07-09

Key Features and Updates:

Demo changes
- Added Stable Diffusion 3 demo.
Plugin changes
- Version 3 of the InstanceNormalization plugin (InstanceNormalization_TRT) has been added. This version is based on the IPluginV3 interface and is used by the TensorRT ONNX parser when native InstanceNormalization is disabled.
Tooling changes
- Pytorch Quantization development has transitioned to TensorRT Model Optimizer. All developers are encouraged to use TensorRT Model Optimizer to benefit from the latest advancements on quantization and compression.
Build containers
- Updated default cuda versions to 12.5.0.

10.1.0 GA - 2024-06-17

Key Features and Updates:

Parser changes
- Added supportsModelV2 API
- Added support for DeformConv operation
- Added support for PluginV3 TensorRT Plugins
- Marked all IParser and IParserRefitter APIs as noexcept
Plugin changes
- Added version 2 of ROIAlign_TRT plugin, which implements the IPluginV3 plugin interface. When importing an ONNX model with the RoiAlign op, this new version of the plugin will be inserted to the TRT network.
Samples changes
- Added a new sample non_zero_plugin, which is a Python version of the C++ sample sampleNonZeroPlugin.
Updated tooling
- Polygraphy v0.49.12
- ONNX-GraphSurgeon v0.5.3

10.0.1 GA - 2024-04-24

Key Features and Updates:

Parser changes
- Added support for building with protobuf-lite.
- Fixed issue when parsing and refitting models with nested BatchNormalization nodes.
- Added support for empty inputs in custom plugin nodes.
Demo changes
- The following demos have been removed: Jasper, Tacotron2, HuggingFace Diffusers notebook
Updated tooling
- Polygraphy v0.49.10
- ONNX-GraphSurgeon v0.5.2
Build Containers
- Updated default cuda versions to 12.4.0.
- Added Rocky Linux 8 and Rocky Linux 9 build containers

10.0.0 EA - 2024-03-27

Key Features and Updates:

Samples changes
- Added a sample showcasing weight-stripped engines.
- Added a sample demonstrating the use of custom tactics with IPluginV3.
- Added a sample to showcase plugins with data-dependent output shapes, using IPluginV3.
Parser changes
- Added a new class IParserRefitter that can be used to refit a TensorRT engine with the weights of an ONNX model.
- kNATIVE_INSTANCENORM is now set to ON by default.
- Added support for IPluginV3 interfaces from TensorRT.
- Added support for INT4 quantization.
- Added support for the reduction attribute in ScatterElements.
- Added support for wrap padding mode in Pad
Plugin changes
- A new plugin has been added in compliance with ONNX ScatterElements.
- The TensorRT plugin library no longer has a load-time link dependency on cuBLAS or cuDNN libraries.
- All plugins which relied on cuBLAS/cuDNN handles passed through IPluginV2Ext::attachToContext() have moved to use cuBLAS/cuDNN resources initialized by the plugin library itself. This works by dynamically loading the required cuBLAS/cuDNN library. Additionally, plugins which independently initialized their cuBLAS/cuDNN resources have also moved to dynamically loading the required library. If the respective library is not discoverable through the library path(s), these plugins will not work.
- bertQKVToContextPlugin: Version 2 of this plugin now supports head sizes less than or equal to 32.
- reorgPlugin: Added a version 2 which implements IPluginV2DynamicExt.
- disentangledAttentionPlugin: Fixed a kernel bug.
Demo changes
- HuggingFace demos have been removed. For all users using TensorRT to accelerate Large Language Model inference, please use TensorRT-LLM.
Updated tooling
- Polygraphy v0.49.9
- ONNX-GraphSurgeon v0.5.1
- TensorRT Engine Explorer v0.1.8
Build Containers
- RedHat/CentOS 7.x are no longer officially supported starting with TensorRT 10.0. The corresponding container has been removed from TensorRT-OSS.

9.3.0 GA - 2024-02-09

Key Features and Updates:

Demo changes
- Faster Text-to-image using SDXL & INT8 quantization using AMMO
Updated tooling
- Polygraphy v0.49.7

9.2.0 GA - 2023-11-27

Key Features and Updates:

trtexec enhancement: Added --weightless flag to mark the engine as weightless.
Parser changes
- Added support for Hardmax operator.
- Changes to a few operator importers to ensure that TensorRT preserves the precision of operations when using strongly typed mode.
Plugin changes
- Explicit INT8 support added to bertQKVToContextPlugin.
- Various bug fixes.
Updated HuggingFace demo to use transformers v4.31.0 and PyTorch v2.1.0.

9.1.0 GA - 2023-10-18

Key Features and Updates:

Update the trt_python_plugin sample.
- Python plugins API reference is part of the offical TRT Python API.
Added samples demonstrating the usage of the progress monitor API.
- Check sampleProgressMonitor for the C++ sample.
- Check simple_progress_monitor for the Python sample.
Remove dependencies related to python<3.8 in python samples as we no longer support python<3.8 for python samples.
Demo changes
- Added LAMBADA dataset accuracy checks in the HuggingFace demo.
- Enabled structured sparsity and FP8 quantized batch matrix multiplication(BMM)s in attention in the NeMo demo.
- Replaced deprecated APIs in the BERT demo.
Updated tooling
- Polygraphy v0.49.1

9.0.1 GA - 2023-09-07

Key Features and Updates:

TensorRT plugin autorhing in Python is now supported
- See the trt_python_plugin sample for reference.
Updated default CUDA version to 12.2
Support for BLIP models, Seq2Seq and Vision2Seq abstractions in HuggingFace demo.
demoDiffusion refactoring and SDXL enhancements
Additional validation asserts for NV Plugins
Updated tooling
- TensorRT Engine Explorer v0.1.7: graph rendering for TensorRT 9.0 kgen kernels
- ONNX-GraphSurgeon v0.3.29
- PyTorch quantization toolkit v2.2.0

9.0.0 EA - 2023-08-06

Key Features and Updates:

Added the NeMo demo to demonstrate the performance benefit of using E4M3 FP8 data type with the GPT models trained with the NVIDIA NeMo Toolkit and TransformerEngine.
Demo Diffusion updates
- Added SDXL 1.0 txt2img pipeline
- Added ControlNet pipeline
- Huggingface demo updates
  - Added Flan-T5, OPT, BLOOM, BLOOMZ, GPT-Neo, GPT-NeoX, Cerebras-GPT support with accuracy check
  - Refactored code and extracted common utils into Seq2Seq class
  - Optimized shape-changing overhead and achieved a >30% e2e performance gain
  - Added stable KV-cache, beam search and fp16 support for all models
  - Added dynamic batch size TRT inference
  - Added uneven-length multi-batch inference with attention_mask support
  - Added chat command – interactive CLI
  - Upgraded PyTorch and HuggingFace version to support Hopper GPU
  - Updated notebooks with much simplified demo API.
Added two new TensorRT samples: sampleProgressMonitor (C++) and simple_progress_reporter (Python) that are examples for using Progress Monitor during engine build.
The following plugins were deprecated:
- BatchedNMS_TRT
- BatchedNMSDynamic_TRT
- BatchTilePlugin_TRT
- Clip_TRT
- CoordConvAC
- CropAndResize
- EfficientNMS_ONNX_TRT
- CustomGeluPluginDynamic
- LReLU_TRT
- NMSDynamic_TRT
- NMS_TRT
- Normalize_TRT
- Proposal
- SingleStepLSTMPlugin
- SpecialSlice_TRT
- Split
Ubuntu 18.04 has reached end of life and is no longer supported by TensorRT starting with 9.0, and the corresponding Dockerfile(s) have been removed.
Support for aarch64 builds will not be available in this release, and the corresponding Dockerfiles have been removed.

8.6.1 GA - 2023-05-02

TensorRT OSS release corresponding to TensorRT 8.6.1.6 GA release.

Updates since TensorRT 8.6.0 EA release.
Please refer to the TensorRT 8.6.1.6 GA release notes for more information.

Key Features and Updates:

Added a new flag --use-cuda-graph to demoDiffusion to improve performance.
Optimized GPT2 and T5 HuggingFace demos to use fp16 I/O tensors for fp16 networks.

8.6.0 EA - 2023-03-10

TensorRT OSS release corresponding to TensorRT 8.6.0.12 EA release.

Updates since TensorRT 8.5.3 GA release.
Please refer to the TensorRT 8.6.0.12 EA release notes for more information.

Key Features and Updates:

demoDiffusion acceleration is now supported out of the box in TensorRT without requiring plugins.
- The following plugins have been removed accordingly: GroupNorm, LayerNorm, MultiHeadCrossAttention, MultiHeadFlashAttention, SeqLen2Spatial, and SplitGeLU.
Added a new sample called onnx_custom_plugin.

8.5.3 GA - 2023-01-30

TensorRT OSS release corresponding to TensorRT 8.5.3.1 GA release.

Updates since TensorRT 8.5.2 GA release.
Please refer to the TensorRT 8.5.3 GA release notes for more information.

Key Features and Updates:

Added the following HuggingFace demos: GPT-J-6B, GPT2-XL, and GPT2-Medium
Added nvinfer1::plugin namespace
Optimized KV Cache performance for T5

8.5.2 GA - 2022-12-12

TensorRT OSS release corresponding to TensorRT 8.5.2.2 GA release.

Updates since TensorRT 8.5.1 GA release.
Please refer to the TensorRT 8.5.2 GA release notes for more information.

Key Features and Updates:

Plugin enhancements
- Added LayerNormPlugin, SplitGeLUPlugin, GroupNormPlugin, and SeqLen2SpatialPlugin to support stable diffusion demo.
KV-cache and beam search to GPT2 and T5 demos

22.12 - 2022-12-06

Added

Stable Diffusion demo using TensorRT Plugins
KV-cache and beam search to GPT2 and T5 demos
Perplexity calculation to all HF demos

Changed

Updated trex to v0.1.5
Increased default workspace size in demoBERT to build BS=128 fp32 engines
Use avg_iter=8 and timing cache to make demoBERT perf more stable

Removed

None

8.5.1 GA - 2022-11-01

TensorRT OSS release corresponding to TensorRT 8.5.1.7 GA release.

Updates since TensorRT 8.4.1 GA release.
Please refer to the TensorRT 8.5.1 GA release notes for more information.

Key Features and Updates:

Samples enhancements
- Added sampleNamedDimensions which works with named dimensions.
- Updated sampleINT8API and introductory_parser_samples to use ONNX models over Caffe/UFF
- Removed UFF/Caffe samples including sampleMNIST, end_to_end_tensorflow_mnist, sampleINT8, sampleMNISTAPI, sampleUffMNIST, sampleUffPluginV2Ext, engine_refit_mnist, int8_caffe_mnist, uff_custom_plugin, sampleFasterRCNN, sampleUffFasterRCNN, sampleGoogleNet, sampleSSD, sampleUffSSD, sampleUffMaskRCNN and uff_ssd.
Plugin enhancements
- Added GridAnchorRectPlugin to support rectangular feature maps in gridAnchorPlugin.
- Added ROIAlignPlugin to support the ONNX operator RoiAlign. The ONNX parser will automatically route ROIAlign ops through the plugin.
- Added Hopper support for the BERTQKVToContextPlugin plugin.
- Exposed the use_int8_scale_max attribute in the BERTQKVToContextPlugin plugin to allow users to disable the by-default usage of INT8 scale factors to optimize softmax MAX reduction in versions 2 and 3 of the plugin.
ONNX-TensorRT changes
- Added support for operator Reciprocal.
Build containers
- Updated default cuda versions to 11.8.0.
Tooling enhancements
- Updated onnx-graphsurgeon to v0.3.25.
- Updated Polygraphy to v0.43.1.
- Updated polygraphy-extension-trtexec to v0.0.8.
- Updated Tensorflow Quantization Toolkit to v0.2.0.

22.08 - 2022-08-16

Updated TensorRT version to 8.4.2 - see the TensorRT 8.4.2 release notes for more information

Changed

Updated default protobuf version to 3.20.x
Updated ONNX-TensorRT submodule version to 22.08 tag
Updated sampleIOFormats and sampleAlgorithmSelector to use ONNX models over Caffe

Fixes

Fixed missing serialization member in custom ClipPlugin plugin used in uff_custom_plugin sample
Fixed various Python import issues

Added

Added new DeBERTA demo
Added version 2 for disentangledAttentionPlugin to support DeBERTA v2

Removed

None

22.07 - 2022-07-21

Added

polygraphy-trtexec-plugin tool for Polygraphy
Multi-profile support for demoBERT
KV cache support for HF BART demo

Changed

Updated ONNX-GS to v0.3.20

Removed

None

8.4.1 GA - 2022-06-14

TensorRT OSS release corresponding to TensorRT 8.4.1.5 GA release.

Updates since TensorRT 8.2.1 GA release.
Please refer to the TensorRT 8.4.1 GA release notes for more information.

Key Features and Updates:

Samples enhancements
- Added Detectron2 Mask R-CNN R50-FPN python sample
- Added a quickstart guide for NVidia Triton deployment workflow.
- Added onnx export script for sampleOnnxMnistCoordConvAC
- Removed sampleNMT.
- Removed usage of deprecated TensorRT APIs in samples.
EfficientDet sample
- Added support for EfficientDet Lite and AdvProp models.
- Added dynamic batch support.
- Added mixed precision engine builder.
HuggingFace transformer demo
- Added BART model.
- Performance speedup of GPT-2 greedy search using GPU implementation.
- Fixed GPT2 onnx export failure due to 2G file size limitation.
- Extended Megatron LayerNorm plugins to support larger hidden sizes.
- Added performance benchmarking mode.
- Enable tf32 format by default.
demoBERT enhancements
- Add --duration flag to perf benchmarking script.
- Fixed import of nvinfer_plugins library in demoBERT on Windows.
Torch-QAT toolkit
- quant_bert.py module removed. It is now upstreamed to HuggingFace QDQBERT.
- Use axis0 as default for deconv.
- #1939 - Fixed path in classification_flow example.
Plugin enhancements
- Added Disentangled attention plugin, DisentangledAttention_TRT, to support DeBERTa model.
- Added Multiscale deformable attention plugin, MultiscaleDeformableAttnPlugin_TRT, to support DDETR model.
- Added new plugins: decodeBbox3DPlugin, pillarScatterPlugin, and voxelGeneratorPlugin.
- Refactored EfficientNMS plugin to support TF-TRT and implicit batch mode.
- fp16 support for pillarScatterPlugin.
Build containers
- Updated default cuda versions to 11.6.2.
- CentOS Linux 8 has reached End-of-Life on Dec 31, 2021. The corresponding container has been removed from TensorRT-OSS.
- Install devtoolset-8 for updated g++ versions in CentOS7 container.
Tooling enhancements
- Added Tensorflow Quantization Toolkit v0.1.0 for Quantization-Aware-Training of Tensorflow 2.x Keras models.
- Added TensorRT Engine Explorer v0.1.2 for inspecting TensorRT engine plans and associated inference profiling data.
- Updated Polygraphy to v0.38.0.
- Updated onnx-graphsurgeon to v0.3.19.
trtexec enhancements
- Added --layerPrecisions and --layerOutputTypes flags for specifying layer-wise precision and output type constraints.
- Added --memPoolSize flag to specify the size of workspace as well as the DLA memory pools via a unified interface. Correspondingly the --workspace flag has been deprecated.
- "End-To-End Host Latency" metric has been removed. Use the “Host Latency” metric instead. For more information, refer to Benchmarking Network section in the TensorRT Developer Guide.
- Use enqueueV2() instead of enqueue() when engine has explicit batch dimensions.

22.06 - 2022-06-08

Added

None

Changed

Disentangled attention (DMHA) plugin refactored
ONNX parser updated to 8.2GA

Removed

None

22.05 - 2022-05-13

Added

Disentangled attention plugin for DeBERTa
DMHA (multiscaleDeformableAttnPlugin) plugin for DDETR
Performance benchmarking mode to HuggingFace demo

Changed

Updated base TensorRT version to 8.2.5.1
Updated onnx-graphsurgeon v0.3.19 CHANGELOG
fp16 support for pillarScatterPlugin
#1939 - Fixed path in quantization classification_flow
Fixed GPT2 onnx export failure due to 2G limitation
Use axis0 as default for deconv in pytorch-quantization toolkit
Updated onnx export script for CoordConvAC sample
Install devtoolset-8 for updated g++ version in CentOS7 container

Removed

Usage of deprecated TensorRT APIs in samples removed
quant_bert.py module removed from pytorch-quantization

22.04 - 2022-04-13

Added

TensorRT Engine Explorer v0.1.0 README
Detectron 2 Mask R-CNN R50-FPN python sample
Model export script for sampleOnnxMnistCoordConvAC

Changed

Updated base TensorRT version to 8.2.4.2
Updated copyright headers with SPDX identifiers
Updated onnx-graphsurgeon v0.3.17 CHANGELOG
PyramidROIAlign plugin refactor and bug fixes
Fixed MultilevelCropAndResize crashes on Windows
#1583 - sublicense ieee/half.h under Apache2
Updated demo/BERT performance tables for rel-8.2
#1774 Fix python hangs at IndexErrors when TF is imported after TensorRT
Various bugfixes in demos - BERT, Tacotron2 and HuggingFace GPT/T5 notebooks
Cleaned up sample READMEs

Removed

sampleNMT removed from samples

22.03 - 2022-03-23

Added

EfficientDet sample enhancements
- Added support for EfficientDet Lite and AdvProp models.
- Added dynamic batch support.
- Added mixed precision engine builder.

Changed

Better decoupling of HuggingFace demo tests

22.02 - 2022-02-04

Added

New plugins: decodeBbox3DPlugin, pillarScatterPlugin, and voxelGeneratorPlugin

Changed

Extend Megatron LayerNorm plugins to support larger hidden sizes
Refactored EfficientNMS plugin for TFTRT and added implicit batch mode support
Update base TensorRT version to 8.2.3.0
GPT-2 greedy search speedup - now runs on GPU
Updates to TensorRT developer tools
- Polygraphy v0.35.1
- onnx-graphsurgeon v0.3.15
Updated ONNX parser to v8.2.3.0
Minor updates and bugfixes
- Samples: TFOD, GPT-2, demo/BERT
- Plugins: proposalPlugin, geluPlugin, bertQKVToContextPlugin, batchedNMS

Removed

Unused source file(s) in demo/BERT

8.2.1 GA - 2021-11-24

TensorRT OSS release corresponding to TensorRT 8.2.1.8 GA release.

Updates since TensorRT 8.2.0 EA release.
Please refer to the TensorRT 8.2.1 GA release notes for more information.
ONNX parser v8.2.1
- Removed duplicate constant layer checks that caused some performance regressions
- Fixed expand dynamic shape calculations
- Added parser-side checks for Scatter layer support
Sample updates
- Added Tensorflow Object Detection API converter samples, including Single Shot Detector, Faster R-CNN and Mask R-CNN models
- Multiple enhancements in HuggingFace transformer demos
  - Added multi-batch support
  - Fixed resultant performance regression in batchsize=1
  - Fixed T5 large/T5-3B accuracy issues
  - Added notebooks for T5 and GPT-2
  - Added CPU benchmarking option
- Deprecated kSTRICT_TYPES (strict type constraints). Equivalent behaviour now achieved by setting PREFER_PRECISION_CONSTRAINTS, DIRECT_IO, and REJECT_EMPTY_ALGORITHMS
- Removed sampleMovieLens
- Renamed sampleReformatFreeIO to sampleIOFormats
- Add idleTime option for samples to control qps
- Specify default value for precisionConstraints
- Fixed reporting of TensorRT build version in trtexec
- Fixed combineDescriptions typo in trtexec/tracer.py
- Fixed usages of of kDIRECT_IO
Plugin updates
- EfficientNMS plugin support extended to TF-TRT, and for clang builds.
- Sanitize header definitions for BERT fused MHA plugin
- Separate C++ and cu files in splitPlugin to avoid PTX generation (required for CUDA enhanced compatibility support)
- Enable C++14 build for plugins
ONNX tooling updates
- onnx-graphsurgeon upgraded to v0.3.14
- Polygraphy upgraded to v0.33.2
- pytorch-quantization toolkit upgraded to v2.1.2
Build and container fixes
- Add SM86 target to default GPU_ARCHS for platforms with cuda-11.1+
- Remove deprecated SM_35 and add SM_60 to default GPU_ARCHS
- Skip CUB builds for cuda 11.0+ #1455
- Fixed cuda-10.2 container build failures in Ubuntu 20.04
- Add native ARM server build container
- Install devtoolset-8 for updated g++ version in CentOS7
- Added a note on supporting c++14 builds for CentOS7
- Fixed docker build for large UIDs #1373
- Updated README instructions for Jetpack builds
demo enhancements
- Updated Tacotron2 instructions and add CPU benchmarking
- Fixed issues in demoBERT python notebook
Documentation updates
- Updated Python documentation for add_reduce, add_top_k, and ISoftMaxLayer
- Renamed default GitHub branch to main and updated hyperlinks

8.2.0 EA - 2021-10-05

Added

Demo applications showcasing TensorRT inference of HuggingFace Transformers.
- Support is currently extended to GPT-2 and T5 models.
Added support for the following ONNX operators:
- Einsum
- IsNan
- GatherND
- Scatter
- ScatterElements
- ScatterND
- Sign
- Round
Added support for building TensorRT Python API on Windows.

Updated

Notable API updates in TensorRT 8.2.0.6 EA release. See TensorRT Developer Guide for details.
- Added three new APIs, IExecutionContext: getEnqueueEmitsProfile(), setEnqueueEmitsProfile(), and reportToProfiler() which can be used to collect layer profiling info when the inference is launched as a CUDA graph.
- Eliminated the global logger; each Runtime, Builder or Refitter now has its own logger.
- Added new operators: IAssertionLayer, IConditionLayer, IEinsumLayer, IIfConditionalBoundaryLayer, IIfConditionalOutputLayer, IIfConditionalInputLayer, and IScatterLayer.
- Added new IGatherLayer modes: kELEMENT and kND
- Added new ISliceLayer modes: kFILL, kCLAMP, and kREFLECT
- Added new IUnaryLayer operators: kSIGN and kROUND
- Added new runtime class IEngineInspector that can be used to inspect the detailed information of an engine, including the layer parameters, the chosen tactics, the precision used, etc.
- ProfilingVerbosity enums have been updated to show their functionality more explicitly.
Updated TensorRT OSS container defaults to cuda 11.4
CMake to target C++14 builds.
Updated following ONNX operators:
- Gather and GatherElements implementations to natively support negative indices
- Pad layer to support ND padding, along with edge and reflect padding mode support
- If layer with general performance improvements.

Removed

Removed sampleMLP.
Several flags of trtexec have been deprecated:
- --explicitBatch flag has been deprecated and has no effect. When the input model is in UFF or in Caffe prototxt format, the implicit batch dimension mode is used automatically; when the input model is in ONNX format, the explicit batch mode is used automatically.
- --explicitPrecision flag has been deprecated and has no effect. When the input ONNX model contains Quantization/Dequantization nodes, TensorRT automatically uses explicit precision mode.
- --nvtxMode=[verbose|default|none] has been deprecated in favor of --profilingVerbosity=[detailed|layer_names_only|none] to show its functionality more explicitly.

21.10 - 2021-10-05

Added

Benchmark script for demoBERT-Megatron
Dynamic Input Shape support for EfficientNMS plugin
Support empty dimensions in ONNX
INT32 and dynamic clips through elementwise in ONNX parser

Changed

Bump TensorRT version to 8.0.3.4
Use static shape for only single batch single sequence input in demo/BERT
Revert to using native FC layer in demo/BERT and FCPlugin only on older GPUs.
Update demo/Tacotron2 for TensorRT 8.0
Updates to TensorRT developer tools
- Polygraphy v0.33.0
  - Added various examples, a CLI User Guide and how-to guides.
  - Added experimental support for DLA.
  - Added a data to-input tool that can combine inputs/outputs created by --save-inputs/--save-outputs.
  - Added a PluginRefRunner which provides CPU reference implementations for TensorRT plugins
  - Made several performance improvements in the Polygraphy CUDA wrapper.
  - Removed the to-json tool which was used to convert Pickled data generated by Polygraphy 0.26.1 and older to JSON.
- Bugfixes and documentation updates in pytorch-quantization toolkit.
Bumped up package versions: tensorflow-gpu 2.5.1, pillow 8.3.2
ONNX parser enhancements and bugfixes
- Update ONNX submodule to v1.8.0
- Update convDeconvMultiInput function to properly handle deconvs
- Update RNN documentation
- Update QDQ axis assertion
- Fix bidirectional activation alpha and beta values
- Fix opset10 Resize
- Fix shape tensor unsqueeze
- Mark BOOL tiles as unsupported
- Remove unnecessary shape tensor checks

Removed

N/A

21.09 - 2021-09-22

Added

Add ONNX2TRT_VERSION overwrite in CMake.

Changed

Updates to TensorRT developer tools
- ONNX-GraphSurgeon v0.3.12
- pytorch-quantization toolkit v2.1.1
Fix assertion in EfficientNMSPlugin

Removed

N/A

21.08 - 2021-08-05

Added

Add demoBERT and demoBERT-MT (sparsity) benchmark data for TensorRT 8.
Added example python notebooks
- BERT - Q&A with TensorRT
- EfficientNet - Object Detection with TensorRT

Changed

Updated samples and plugins directory structure
Updates to TensorRT developer tools
- Polygraphy v0.31.1
- ONNX-GraphSurgeon v0.3.11
- pytorch-quantization toolkit v2.1.1
README fix to update build command for native aarch64 builds.

Removed

N/A

21.07 - 2021-07-21

Identical to the TensorRT-OSS 8.0.1 Release.

8.0.1 - 2021-07-02

Added

Added support for the following ONNX operators: Celu, CumSum, EyeLike, GatherElements, GlobalLpPool, GreaterOrEqual, LessOrEqual, LpNormalization, LpPool, ReverseSequence, and SoftmaxCrossEntropyLoss details.
Rehauled Resize ONNX operator, now fully supporting the following modes:
- Coordinate Transformation modes: half_pixel, pytorch_half_pixel, tf_half_pixel_for_nn, asymmetric, and align_corners.
- Modes: nearest, linear.
- Nearest Modes: floor, ceil, round_prefer_floor, round_prefer_ceil.
Added support for multi-input ONNX ConvTranpose operator.
Added support for 3D spatial dimensions in ONNX InstanceNormalization.
Added support for generic 2D padding in ONNX.
ONNX QuantizeLinear and DequantizeLinear operators leverage IQuantizeLayer and IDequantizeLayer.
- Added support for tensor scales.
- Added support for per-axis quantization.
Added EfficientNMS_TRT, EfficientNMS_ONNX_TRT plugins and experimental support for ONNX NonMaxSuppression operator.
Added ScatterND plugin.
Added TensorRT QuickStart Guide.
Added new samples: engine_refit_onnx_bidaf builds an engine from ONNX BiDAF model and refits engine with new weights, efficientdet and efficientnet samples for demonstrating Object Detection using TensorRT.
Added support for Ubuntu20.04 and RedHat/CentOS 8.3.
Added Python 3.9 support.

Changed

Update Polygraphy to v0.30.3.
Update ONNX-GraphSurgeon to v0.3.10.
Update Pytorch Quantization toolkit to v2.1.0.
Notable TensorRT API updates
- TensorRT now declares API’s with the noexcept keyword. All TensorRT classes that an application inherits from (such as IPluginV2) must guarantee that methods called by TensorRT do not throw uncaught exceptions, or the behavior is undefined.
- Destructors for classes with destroy() methods were previously protected. They are now public, enabling use of smart pointers for these classes. The destroy() methods are deprecated.
Moved RefitMap API from ONNX parser to core TensorRT.
Various bugfixes for plugins, samples and ONNX parser.
Port demoBERT to tensorflow2 and update UFF samples to leverage nvidia-tensorflow1 container.

Removed

IPlugin and IPluginFactory interfaces were deprecated in TensorRT 6.0 and have been removed in TensorRT 8.0. We recommend that you write new plugins or refactor existing ones to target the IPluginV2DynamicExt and IPluginV2IOExt interfaces. For more information, refer to Migrating Plugins From TensorRT 6.x Or 7.x To TensorRT 8.x.x.
- For plugins based on IPluginV2DynamicExt and IPluginV2IOExt, certain methods with legacy function signatures (derived from IPluginV2 and IPluginV2Ext base classes) which were deprecated and marked for removal in TensorRT 8.0 will no longer be available.
Removed samplePlugin since it showcased IPluginExt interface, which is no longer supported in TensorRT 8.0.
Removed sampleMovieLens and sampleMovieLensMPS.
Removed Dockerfile for Ubuntu 16.04. TensorRT 8.0 debians for Ubuntu 16.04 require python 3.5 while minimum required python version for TensorRT OSS is 3.6.
Removed support for PowerPC builds, consistent with TensorRT GA releases.

Notes

We had deprecated the Caffe Parser and UFF Parser in TensorRT 7.0. They are still tested and functional in TensorRT 8.0, however, we plan to remove the support in a future release. Ensure you migrate your workflow to use tf2onnx, keras2onnx or TensorFlow-TensorRT (TF-TRT).
Refer to TensorRT 8.0.1 GA Release Notes for additional details

21.06 - 2021-06-23

Added

Add switch for batch-agnostic mode in NMS plugin
Add missing model.py in uff_custom_plugin sample

Changed

Update to Polygraphy v0.29.2
Update to ONNX-GraphSurgeon v0.3.9
Fix numerical errors for float type in NMS/batchedNMS plugins
Update demoBERT input dimensions to match Triton requirement #1051
Optimize TLT MaskRCNN plugins:
- enable fp16 precision in multilevelCropAndResizePlugin and multilevelProposeROIPlugin
- Algorithms optimization for NMS kernels and ROIAlign kernel
- Fix invalid cuda config issue when bs is larger than 32
- Fix issues found on Jetson NANO

Removed

Removed fcplugin from demoBERT to improve latency

21.05 - 2021-05-20

Added

Extended support for ONNX operator InstanceNormalization to 5D tensors
Support negative indices in ONNX Gather operator
Add support for importing ONNX double-typed weights as float
ONNX-GraphSurgeon (v0.3.7) support for models with externally stored weights

Changed

Update ONNX-TensorRT to 21.05
Relicense ONNX-TensorRT under Apache2
demoBERT builder fixes for multi-batch
Speedup demoBERT build using global timing cache and disable cuDNN tactics
Standardize python package versions across OSS samples
Bugfixes in multilevelProposeROI and bertQKV plugin
Fix memleaks in samples logger

21.04 - 2021-04-12

Added

SM86 kernels for BERT MHA plugin
Added opset13 support for SoftMax, LogSoftmax, Squeeze, and Unsqueeze.
Added support for the EyeLike and GatherElements operators.

Changed

Updated TensorRT version to v7.2.3.4.
Update to ONNX-TensorRT 21.03
ONNX-GraphSurgeon (v0.3.4) - updates fold_constants to correctly exit early.
Set default CUDA_INSTALL_DIR #798
Plugin bugfixes, qkv kernels for sm86
Fixed GroupNorm CMakeFile for cu sources #1083
Permit groupadd with non-unique GID in build containers #1091
Avoid reinterpret_cast #146
Clang-format plugins and samples
Avoid arithmetic on void pointer in multilevelProposeROIPlugin.cpp #1028
Update BERT plugin documentation.

Removed

Removes extra terminate call in InstanceNorm

21.03 - 2021-03-09

Added

Optimized FP16 NMS/batchedNMS plugins with n-bit radix sort and based on IPluginV2DynamicExt
ProposalDynamic and CropAndResizeDynamic plugins based on IPluginV2DynamicExt

Changed

ONNX-TensorRT v21.03 update
ONNX-GraphSurgeon v0.3.3 update
Bugfix for scaledSoftmax kernel

Removed

N/A

21.02 - 2021-02-01

Added

TensorRT Python API bindings
TensorRT Python samples
FP16 support to batchedNMSPlugin #1002
Configurable input size for TLT MaskRCNN Plugin #986

Changed

TensorRT version updated to 7.2.2.3
ONNX-TensorRT v21.02 update
Polygraphy v0.21.1 update
PyTorch-Quantization Toolkit v2.1.0 update
- Documentation update, ONNX opset 13 support, ResNet example
ONNX-GraphSurgeon v0.28 update
demoBERT builder updated to work with Tensorflow2 (in compatibility mode)
Refactor Dockerfiles for OSS container

Removed

N/A

20.12 - 2020-12-18

Added

Add configurable input size for TLT MaskRCNN Plugin

Changed

Update symbol export map for plugins
Correctly use channel dimension when creating Prelu node
Fix Jetson cross compilation CMakefile

Removed

N/A

20.11 - 2020-11-20

Added

API documentation for ONNX-GraphSurgeon

Changed

Support for SM86 in demoBERT
Updated NGC checkpoint URLs for demoBERT and Tacotron2.

Removed

N/A

20.10 - 2020-10-22

Added

Polygraphy v0.20.13 - Deep Learning Inference Prototyping and Debugging Toolkit
PyTorch-Quantization Toolkit v2.0.0
Updated BERT plugins for variable sequence length inputs
Optimized kernels for sequence lengths of 64 and 96 added
Added Tacotron2 + Waveglow TTS demo #677
Re-enable GridAnchorRect_TRT plugin with rectangular feature maps #679
Update batchedNMS plugin to IPluginV2DynamicExt interface #738
Support 3D inputs in InstanceNormalization plugin #745
Added this CHANGELOG.md

Changed

ONNX GraphSurgeon - v0.2.7 with bugfixes, new examples.
demo/BERT bugfixes for Jetson Xavier
Updated build Dockerfile to cuda-11.1
Updated ClangFormat style specification according to TensorRT coding guidelines

Removed

N/A

7.2.1 - 2020-10-20

Added

Polygraphy v0.20.13 - Deep Learning Inference Prototyping and Debugging Toolkit
PyTorch-Quantization Toolkit v2.0.0
Updated BERT plugins for variable sequence length inputs
- Optimized kernels for sequence lengths of 64 and 96 added
Added Tacotron2 + Waveglow TTS demo #677
Re-enable GridAnchorRect_TRT plugin with rectangular feature maps #679
Update batchedNMS plugin to IPluginV2DynamicExt interface #738
Support 3D inputs in InstanceNormalization plugin #745
Added this CHANGELOG.md

Changed

ONNX GraphSurgeon - v0.2.7 with bugfixes, new examples.
demo/BERT bugfixes for Jetson Xavier
Updated build Dockerfile to cuda-11.1
Updated ClangFormat style specification according to TensorRT coding guidelines

Removed

N/A

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

TensorRT OSS Release Changelog

10.6.0 GA - 2024-11-05

10.5.0 GA - 2024-09-30

10.4.0 GA - 2024-09-18

10.3.0 GA - 2024-08-02

10.2.0 GA - 2024-07-09

10.1.0 GA - 2024-06-17

10.0.1 GA - 2024-04-24

10.0.0 EA - 2024-03-27

9.3.0 GA - 2024-02-09

9.2.0 GA - 2023-11-27

9.1.0 GA - 2023-10-18

9.0.1 GA - 2023-09-07

9.0.0 EA - 2023-08-06

8.6.1 GA - 2023-05-02

8.6.0 EA - 2023-03-10

8.5.3 GA - 2023-01-30

8.5.2 GA - 2022-12-12

22.12 - 2022-12-06

Added

Changed

Removed

8.5.1 GA - 2022-11-01

22.08 - 2022-08-16

Changed

Fixes

Added

Removed

22.07 - 2022-07-21

Added

Changed

Removed

8.4.1 GA - 2022-06-14

22.06 - 2022-06-08

Added

Changed

Removed

22.05 - 2022-05-13

Added

Changed

Removed

22.04 - 2022-04-13

Added

Changed

Removed

22.03 - 2022-03-23

Added

Changed

22.02 - 2022-02-04

Added

Changed

Removed

8.2.1 GA - 2021-11-24

8.2.0 EA - 2021-10-05

Added

Updated

Removed

21.10 - 2021-10-05

Added

Changed

Removed

21.09 - 2021-09-22

Added

Changed

Removed

21.08 - 2021-08-05

Added

Changed

Removed

21.07 - 2021-07-21

8.0.1 - 2021-07-02

Added

Changed

Removed