Cuda Runtime (out of memory) failure of TensorRT 10.3.0 when running trtexec on GPU RTX4060/jetson/etc #4258

zargooshifar · 2024-11-22T12:49:27Z

Description

I'm trying to convert yoloV8-seg model to TensorRT engine, I'm using DeepStream-Yolo-Seg for converting the model to onnx.
after running trtexec with the converted onnx file I'm getting this errors:

[11/22/2024-13:40:08] [I] Finished parsing network model. Parse time: 0.0704936
[11/22/2024-13:40:08] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored.
[11/22/2024-13:40:47] [E] Error[1]: [defaultAllocator.cpp::allocate::31] Error Code 1: Cuda Runtime (out of memory)
[11/22/2024-13:40:47] [W] [TRT] Requested amount of GPU memory (15485030400 bytes) could not be allocated. There may not be enough free memory for allocation to succeed.
[11/22/2024-13:40:47] [E] Error[9]: Error Code: 9: Skipping tactic 0x0000000000000000 due to exception [tunable_graph.cpp:create:117] autotuning: User allocator error allocating 15485030400-byte buffer
[11/22/2024-13:40:47] [E] Error[10]: IBuilder::buildSerializedNetwork: Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[/1/Constant_36_output_0.../1/Slice_4]}.)
[11/22/2024-13:40:47] [E] Engine could not be created from network
[11/22/2024-13:40:47] [E] Building engine failed
[11/22/2024-13:40:47] [E] Failed to create engine from model or file.
[11/22/2024-13:40:47] [E] Engine set up failed

with TensorRT 10.0.0.6-1+cuda11.8 the engine can be created, but anything newer it fails.

Environment

TensorRT Version: 10.3.0
NVIDIA GPU: RTX4060
NVIDIA Driver Version: 565.57.01
CUDA Version: 12.6
CUDNN Version: 9.5.1.17-1

Operating System: Ubuntu 22.04.5 LTS
Python Version (if applicable): 3.10.12-1~22.04.7
PyTorch Version (if applicable): 2.5.1

Steps To Reproduce

 python export_yoloV8_seg.py --weights yolov8s-seg.pt
/usr/src/tensorrt/bin/trtexec --onnx=yolov8s-seg.onnx

The text was updated successfully, but these errors were encountered:

lix19937 · 2024-11-25T11:31:03Z

Maybe is a bug, you can try to modify the maxWorkspaceSize or memoryPoolLimit, either larger or smaller will not solve the problem.

zargooshifar changed the title ~~Cuda Runtime (out of memory) failure of TensorRT 10.3.0 when running trtexec on GPU 4060/jetson/etc~~ Cuda Runtime (out of memory) failure of TensorRT 10.3.0 when running trtexec on GPU RTX4060/jetson/etc Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda Runtime (out of memory) failure of TensorRT 10.3.0 when running trtexec on GPU RTX4060/jetson/etc #4258

Cuda Runtime (out of memory) failure of TensorRT 10.3.0 when running trtexec on GPU RTX4060/jetson/etc #4258

zargooshifar commented Nov 22, 2024 •

edited

Loading

lix19937 commented Nov 25, 2024

Cuda Runtime (out of memory) failure of TensorRT 10.3.0 when running trtexec on GPU RTX4060/jetson/etc #4258

Cuda Runtime (out of memory) failure of TensorRT 10.3.0 when running trtexec on GPU RTX4060/jetson/etc #4258

Comments

zargooshifar commented Nov 22, 2024 • edited Loading

Description

Environment

Steps To Reproduce

lix19937 commented Nov 25, 2024

zargooshifar commented Nov 22, 2024 •

edited

Loading