Skip to content

Commit

Permalink
Merge pull request #4342 from yuanyao-nv/dev-10.8-staging
Browse files Browse the repository at this point in the history
TensorRT 10.8-GA OSS Release
  • Loading branch information
yuanyao-nv authored Feb 1, 2025
2 parents 97ff244 + 9443fc4 commit 64e56ab
Show file tree
Hide file tree
Showing 266 changed files with 1,147,683 additions and 1,381 deletions.
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,29 @@
# TensorRT OSS Release Changelog

## 10.8.0 GA - 2025-1-31
Key Features and Updates:

- Demo changes
- demoDiffusion
- Added [Image-to-Image](demo/Diffusion#generate-an-image-guided-by-an-initial-image-and-a-text-prompt-using-flux) support for Flux-1.dev and Flux.1-schnell pipelines.
- Added [ControlNet](demo/Diffusion#generate-an-image-guided-by-a-text-prompt-and-a-control-image-using-flux-controlnet) support for [FLUX.1-Canny-dev](https://huggingface.co/black-forest-labs/FLUX.1-Canny-dev) and [FLUX.1-Depth-dev](https://huggingface.co/black-forest-labs/FLUX.1-Depth-dev) pipelines. Native FP8 quantization is also supported for these pipelines.
- Added support for ONNX model export only mode. See [--onnx-export-only](demo/Diffusion#https://gitlab-master.nvidia.com/TensorRT/Public/oss/-/tree/release/10.8/demo/Diffusion?ref_type=heads#use-separate-directories-for-individual-onnx-models).
- Added FP16, BF16, FP8, and FP4 support for all Flux Pipelines.
- Plugin changes
- Added SM 100 and SM 120 support to bertQKVToContextPlugin. This enables demo/BERT on Blackwell GPUs.
- Sample changes
- Added a new `sampleEditableTimingCache` to demonstrate how to build an engine with the desired tactics by modifying the timing cache.
- Deleted the `sampleAlgorithmSelector` sample.
- Fixed `sampleOnnxMNIST` by updating the correct INT8 dynamic range.
- Parser changes
- Added support for `FLOAT4E2M1` types for quantized networks.
- Added support for dynamic axes and improved performance of `CumSum` operations.
- Fixed the import of local functions when their input tensor names aliased one from an outside scope.
- Added support for `Pow` ops with integer-typed exponent values.
- Fixed issues
- Fixed segmentation of boolean constant nodes - [4224](https://github.com/NVIDIA/TensorRT/issues/4224).
- Fixed accuracy issue when multiple optimization profiles were defined [4250](https://github.com/NVIDIA/TensorRT/issues/4250).

## 10.7.0 GA - 2024-12-4
Key Feature and Updates:

Expand Down
56 changes: 28 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,13 +26,13 @@ You can skip the **Build** section to enjoy TensorRT with Python.
To build the TensorRT-OSS components, you will first need the following software packages.

**TensorRT GA build**
* TensorRT v10.7.0.23
* TensorRT v10.8.0.43
* Available from direct download links listed below

**System Packages**
* [CUDA](https://developer.nvidia.com/cuda-toolkit)
* Recommended versions:
* cuda-12.6.0 + cuDNN-8.9
* cuda-12.8.0 + cuDNN-8.9
* cuda-11.8.0 + cuDNN-8.9
* [GNU make](https://ftp.gnu.org/gnu/make/) >= v4.1
* [cmake](https://github.com/Kitware/CMake/releases) >= v3.13
Expand Down Expand Up @@ -73,25 +73,25 @@ To build the TensorRT-OSS components, you will first need the following software
If using the TensorRT OSS build container, TensorRT libraries are preinstalled under `/usr/lib/x86_64-linux-gnu` and you may skip this step.

Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com) with the direct links below:
- [TensorRT 10.7.0.23 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.7.0/tars/TensorRT-10.7.0.23.Linux.x86_64-gnu.cuda-11.8.tar.gz)
- [TensorRT 10.7.0.23 for CUDA 12.6, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.7.0/tars/TensorRT-10.7.0.23.Linux.x86_64-gnu.cuda-12.6.tar.gz)
- [TensorRT 10.7.0.23 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.7.0/zip/TensorRT-10.7.0.23.Windows.win10.cuda-11.8.zip)
- [TensorRT 10.7.0.23 for CUDA 12.6, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.7.0/zip/TensorRT-10.7.0.23.Windows.win10.cuda-12.6.zip)
- [TensorRT 10.8.0.43 for CUDA 11.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.8.0/tars/TensorRT-10.8.0.43.Linux.x86_64-gnu.cuda-11.8.tar.gz)
- [TensorRT 10.8.0.43 for CUDA 12.8, Linux x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.8.0/tars/TensorRT-10.8.0.43.Linux.x86_64-gnu.cuda-12.8.tar.gz)
- [TensorRT 10.8.0.43 for CUDA 11.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.8.0/zip/TensorRT-10.8.0.43.Windows.win10.cuda-11.8.zip)
- [TensorRT 10.8.0.43 for CUDA 12.8, Windows x86_64](https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/10.8.0/zip/TensorRT-10.8.0.43.Windows.win10.cuda-12.8.zip)


**Example: Ubuntu 20.04 on x86-64 with cuda-12.6**
**Example: Ubuntu 20.04 on x86-64 with cuda-12.8**

```bash
cd ~/Downloads
tar -xvzf TensorRT-10.7.0.23.Linux.x86_64-gnu.cuda-12.6.tar.gz
export TRT_LIBPATH=`pwd`/TensorRT-10.7.0.23
tar -xvzf TensorRT-10.8.0.43.Linux.x86_64-gnu.cuda-12.8.tar.gz
export TRT_LIBPATH=`pwd`/TensorRT-10.8.0.43
```

**Example: Windows on x86-64 with cuda-12.6**
**Example: Windows on x86-64 with cuda-12.8**

```powershell
Expand-Archive -Path TensorRT-10.7.0.23.Windows.win10.cuda-12.6.zip
$env:TRT_LIBPATH="$pwd\TensorRT-10.7.0.23\lib"
Expand-Archive -Path TensorRT-10.8.0.43.Windows.win10.cuda-12.8.zip
$env:TRT_LIBPATH="$pwd\TensorRT-10.8.0.43\lib"
```

## Setting Up The Build Environment
Expand All @@ -101,27 +101,27 @@ For Linux platforms, we recommend that you generate a docker container for build
1. #### Generate the TensorRT-OSS build container.
The TensorRT-OSS build container can be generated using the supplied Dockerfiles and build scripts. The build containers are configured for building TensorRT OSS out-of-the-box.

**Example: Ubuntu 20.04 on x86-64 with cuda-12.6 (default)**
**Example: Ubuntu 20.04 on x86-64 with cuda-12.8 (default)**
```bash
./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.6
./docker/build.sh --file docker/ubuntu-20.04.Dockerfile --tag tensorrt-ubuntu20.04-cuda12.8
```
**Example: Rockylinux8 on x86-64 with cuda-12.6**
**Example: Rockylinux8 on x86-64 with cuda-12.8**
```bash
./docker/build.sh --file docker/rockylinux8.Dockerfile --tag tensorrt-rockylinux8-cuda12.6
./docker/build.sh --file docker/rockylinux8.Dockerfile --tag tensorrt-rockylinux8-cuda12.8
```
**Example: Ubuntu 22.04 cross-compile for Jetson (aarch64) with cuda-12.6 (JetPack SDK)**
**Example: Ubuntu 22.04 cross-compile for Jetson (aarch64) with cuda-12.8 (JetPack SDK)**
```bash
./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda12.6
./docker/build.sh --file docker/ubuntu-cross-aarch64.Dockerfile --tag tensorrt-jetpack-cuda12.8
```
**Example: Ubuntu 22.04 on aarch64 with cuda-12.6**
**Example: Ubuntu 22.04 on aarch64 with cuda-12.8**
```bash
./docker/build.sh --file docker/ubuntu-22.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu22.04-cuda12.6
./docker/build.sh --file docker/ubuntu-22.04-aarch64.Dockerfile --tag tensorrt-aarch64-ubuntu22.04-cuda12.8
```

2. #### Launch the TensorRT-OSS build container.
**Example: Ubuntu 20.04 build container**
```bash
./docker/launch.sh --tag tensorrt-ubuntu20.04-cuda12.6 --gpus all
./docker/launch.sh --tag tensorrt-ubuntu20.04-cuda12.8 --gpus all
```
> NOTE:
<br> 1. Use the `--tag` corresponding to build container generated in Step 1.
Expand All @@ -132,38 +132,38 @@ For Linux platforms, we recommend that you generate a docker container for build
## Building TensorRT-OSS
* Generate Makefiles and build.

**Example: Linux (x86-64) build with default cuda-12.6**
**Example: Linux (x86-64) build with default cuda-12.8**
```bash
cd $TRT_OSSPATH
mkdir -p build && cd build
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out
make -j$(nproc)
```
**Example: Linux (aarch64) build with default cuda-12.6**
**Example: Linux (aarch64) build with default cuda-12.8**
```bash
cd $TRT_OSSPATH
mkdir -p build && cd build
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64-native.toolchain
make -j$(nproc)
```
**Example: Native build on Jetson (aarch64) with cuda-12.6**
**Example: Native build on Jetson (aarch64) with cuda-12.8**
```bash
cd $TRT_OSSPATH
mkdir -p build && cd build
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DTRT_PLATFORM_ID=aarch64 -DCUDA_VERSION=12.6
cmake .. -DTRT_LIB_DIR=$TRT_LIBPATH -DTRT_OUT_DIR=`pwd`/out -DTRT_PLATFORM_ID=aarch64 -DCUDA_VERSION=12.8
CC=/usr/bin/gcc make -j$(nproc)
```
> NOTE: C compiler must be explicitly specified via CC= for native aarch64 builds of protobuf.

**Example: Ubuntu 22.04 Cross-Compile for Jetson (aarch64) with cuda-12.6 (JetPack)**
**Example: Ubuntu 22.04 Cross-Compile for Jetson (aarch64) with cuda-12.8 (JetPack)**
```bash
cd $TRT_OSSPATH
mkdir -p build && cd build
cmake .. -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64.toolchain -DCUDA_VERSION=12.6 -DCUDNN_LIB=/pdk_files/cudnn/usr/lib/aarch64-linux-gnu/libcudnn.so -DCUBLAS_LIB=/usr/local/cuda-12.6/targets/aarch64-linux/lib/stubs/libcublas.so -DCUBLASLT_LIB=/usr/local/cuda-12.6/targets/aarch64-linux/lib/stubs/libcublasLt.so -DTRT_LIB_DIR=/pdk_files/tensorrt/lib
cmake .. -DCMAKE_TOOLCHAIN_FILE=$TRT_OSSPATH/cmake/toolchains/cmake_aarch64.toolchain -DCUDA_VERSION=12.8 -DCUDNN_LIB=/pdk_files/cudnn/usr/lib/aarch64-linux-gnu/libcudnn.so -DCUBLAS_LIB=/usr/local/cuda-12.8/targets/aarch64-linux/lib/stubs/libcublas.so -DCUBLASLT_LIB=/usr/local/cuda-12.8/targets/aarch64-linux/lib/stubs/libcublasLt.so -DTRT_LIB_DIR=/pdk_files/tensorrt/lib
make -j$(nproc)
```

**Example: Native builds on Windows (x86) with cuda-12.6**
**Example: Native builds on Windows (x86) with cuda-12.8**
```powershell
cd $TRT_OSSPATH
mkdir -p build
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
10.7.0.23
10.8.0.43
12 changes: 9 additions & 3 deletions cmake/modules/find_library_create_target.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,16 @@
macro(find_library_create_target target_name lib libtype hints)
message(STATUS "========================= Importing and creating target ${target_name} ==========================")
message(STATUS "Looking for library ${lib}")
if (CMAKE_BUILD_TYPE STREQUAL "Debug")
find_library(${lib}_LIB_PATH ${lib}${TRT_DEBUG_POSTFIX} HINTS ${hints} NO_DEFAULT_PATH)
if(CMAKE_BUILD_TYPE STREQUAL "Debug")
find_library(
${lib}_LIB_PATH ${lib}${TRT_DEBUG_POSTFIX}
HINTS ${hints}
NO_DEFAULT_PATH)
endif()
find_library(${lib}_LIB_PATH ${lib} HINTS ${hints} NO_DEFAULT_PATH)
find_library(
${lib}_LIB_PATH ${lib}
HINTS ${hints}
NO_DEFAULT_PATH)
find_library(${lib}_LIB_PATH ${lib})
message(STATUS "Library that was found ${${lib}_LIB_PATH}")
add_library(${target_name} ${libtype} IMPORTED)
Expand Down
10 changes: 6 additions & 4 deletions cmake/modules/set_ifndef.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,10 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
function (set_ifndef variable value)
if(NOT DEFINED ${variable})
set(${variable} ${value} PARENT_SCOPE)
endif()
function(set_ifndef variable value)
if(NOT DEFINED ${variable})
set(${variable}
${value}
PARENT_SCOPE)
endif()
endfunction()
Loading

0 comments on commit 64e56ab

Please sign in to comment.