From 2d2a7d22bd87cdd9d76e9e34775f6d0b21bc0470 Mon Sep 17 00:00:00 2001 From: biswaroop1547 Date: Sun, 3 Sep 2023 16:39:36 +0530 Subject: [PATCH] add: tensorrt limitations --- model-formats.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/model-formats.md b/model-formats.md index bf2bb72..edfd436 100644 --- a/model-formats.md +++ b/model-formats.md @@ -295,6 +295,14 @@ Nvidia also kept few [tooling](https://docs.nvidia.com/deeplearning/tensorrt/#to ### Limitations +Currently every model checkpoint one creates needs to be recompiled first to ONNX and then to TensorRT, so for using [LORA](https://github.com/microsoft/LoRA) it has to be added into the model at compile time. More issues can be found in [this reddit post](https://www.reddit.com/r/StableDiffusion/comments/141qvw4/tensorrt_may_be_2x_faster_but_it_has_a_lot_of/). + + +INT4 and INT16 quantization is not supported by TensorRT currently. + + +Many [ONNX operators](https://github.com/onnx/onnx/blob/main/docs/Operators.md) are [not yet supported](https://github.com/onnx/onnx-tensorrt/blob/main/docs/operators.md) by TensorRT and few supported ones have restrictions. + ### License It's freely available under [Apache License 2.0](https://github.com/NVIDIA/TensorRT/blob/main/LICENSE).