Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retinanet run harness fails 'executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3:' #1866

Open
stbailey001 opened this issue Oct 2, 2024 · 4 comments

Comments

@stbailey001
Copy link

Trying to run offline retinanet in a container with one Nvidia GPU:
cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev --model=retinanet --implementation=nvidia --framework=tensorrt --category=datacenter --scenario=Offline --execution_mode=test --device=cuda --gpu_name=l4 --docker_cache=no --quiet --test_query_count=500

Fails execution of
[E] [TRT] 3: [executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3: Internal Error (Profile 0 has been chosen by another IExecutionContext. Use another profileIndex or destroy the IExecutionContext that use this profile.)

Full error:
CMD: make run_harness RUN_ARGS=' --benchmarks=retinanet --scenarios=offline --test_mode=PerformanceOnly --offline_expected_qps=1 --user_conf_path=/root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf --mlperf_conf_path=/root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf --gpu_batch_size=2 --no_audit_verify ' 2>&1 ; echo $? > exitstatus | tee '/root/CM/repos/local/cache/aba6a14ff6834703/test_results/e1ef67ab5fc2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1/console.out'

INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f
INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/benchmark-program/run-ubuntu.sh from tmp-run.sh

make run_harness RUN_ARGS=' --benchmarks=retinanet --scenarios=offline --test_mode=PerformanceOnly --offline_expected_qps=1 --user_conf_path=/root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf --mlperf_conf_path=/root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf --gpu_batch_size=2 --no_audit_verify ' 2>&1 ; echo $? > exitstatus | tee '/root/CM/repos/local/cache/aba6a14ff6834703/test_results/e1ef67ab5fc2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1/console.out'
[2024-10-02 15:10:05,960 main.py:229 INFO] Detected system ID: KnownSystem.e1ef67ab5fc2
[2024-10-02 15:10:06,139 harness.py:249 INFO] The harness will load 2 plugins: ['build/plugins/NMSOptPlugin/libnmsoptplugin.so', 'build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so']
[2024-10-02 15:10:06,139 generate_conf_files.py:107 INFO] Generated measurements/ entries for e1ef67ab5fc2_TRT/retinanet/Offline
[2024-10-02 15:10:06,140 init.py:46 INFO] Running command: ./build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/root/CM/repos/local/cache/aba6a14ff6834703/test_results/e1ef67ab5fc2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1" --logfile_prefix="mlperf_log_" --performance_sample_count=64 --test_mode="PerformanceOnly" --gpu_batch_size=2 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --mlperf_conf_path="/root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=false --user_conf_path="/root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf" --gpu_engines="./build/engines/e1ef67ab5fc2/retinanet/Offline/retinanet-Offline-gpu-b2-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario Offline --model retinanet --response_postprocess openimageeffnms
[2024-10-02 15:10:06,140 init.py:53 INFO] Overriding Environment
benchmark : Benchmark.Retinanet
buffer_manager_thread_count : 0
data_dir : /root/CM/repos/local/cache/b92e7a28ac454f52/data
gpu_batch_size : 2
input_dtype : int8
input_format : linear
log_dir : /root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/build/logs/2024.10.02-15.10.04
map_path : data_maps/open-images-v6-mlperf/val_map.txt
mlperf_conf_path : /root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf
offline_expected_qps : 1.0
precision : int8
preprocessed_data_dir : /root/CM/repos/local/cache/b92e7a28ac454f52/preprocessed_data
scenario : Scenario.Offline
system : SystemConfiguration(host_cpu_conf=CPUConfiguration(layout={CPU(name='AMD EPYC 9J14 96-Core Processor', architecture=<CPUArchitecture.x86_64: AliasedName(name='x86_64', aliases=(), patterns=())>, core_count=1, threads_per_core=1): 64}), host_mem_conf=MemoryConfiguration(host_memory_capacity=Memory(quantity=292.215448, byte_suffix=<ByteSuffix.GB: (1000, 3)>, _num_bytes=292215448000), comparison_tolerance=0.05), accelerator_conf=AcceleratorConfiguration(layout=defaultdict(<class 'int'>, {GPU(name='NVIDIA L4', accelerator_type=<AcceleratorType.Discrete: AliasedName(name='Discrete', aliases=(), patterns=())>, vram=Memory(quantity=22.494140625, byte_suffix=<ByteSuffix.GiB: (1024, 3)>, _num_bytes=24152899584), max_power_limit=72.0, pci_id='0x27B810DE', compute_sm=89): 1})), numa_conf=None, system_id='e1ef67ab5fc2')
tensor_path : build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear
test_mode : PerformanceOnly
use_graphs : False
user_conf_path : /root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf
system_id : e1ef67ab5fc2
config_name : e1ef67ab5fc2_retinanet_Offline
workload_setting : WorkloadSetting(HarnessType.LWIS, AccuracyTarget.k_99, PowerSetting.MaxP)
optimization_level : plugin-enabled
num_profiles : 1
config_ver : lwis_k_99_MaxP
accuracy_level : 99%
inference_server : lwis
skip_file_checks : False
power_limit : None
cpu_freq : None
&&&& RUNNING Default_Harness # ./build/bin/harness_default
[I] mlperf.conf path: /root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf
[I] user.conf path: /root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf
Creating QSL.
Finished Creating QSL.
Setting up SUT.
[I] [TRT] Loaded engine size: 73 MiB
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +10, now: CPU 126, GPU 473 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 128, GPU 483 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +68, now: CPU 0, GPU 68 (MiB)
[I] Device:0.GPU: [0] ./build/engines/e1ef67ab5fc2/retinanet/Offline/retinanet-Offline-gpu-b2-int8.lwis_k_99_MaxP.plan has been successfully loaded.
[E] [TRT] 3: [runtime.cpp::~Runtime::401] Error Code 3: API Usage Error (Parameter check failed at: runtime/rt/runtime.cpp::~Runtime::401, condition: mEngineCounter.use_count() == 1 Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 55, GPU 485 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 55, GPU 493 (MiB)
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +1528, now: CPU 0, GPU 1596 (MiB)
[I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 56, GPU 2029 (MiB)
[I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 56, GPU 2039 (MiB)
[I] [TRT] Could not set default profile 0 for execution context. Profile index must be set explicitly.
[I] [TRT] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +1, GPU +1528, now: CPU 1, GPU 3124 (MiB)
[E] [TRT] 3: [executionContext.cpp::setOptimizationProfileInternal::1328] Error Code 3: Internal Error (Profile 0 has been chosen by another IExecutionContext. Use another profileIndex or destroy the IExecutionContext that use this profile.)
F1002 15:10:07.041591 180493 lwis.cpp:245] Check failed: context->setOptimizationProfile(profileIdx) == true (0 vs. 1)
*** Check failure stack trace: ***
@ 0x7fe94f4401c3 google::LogMessage::Fail()
@ 0x7fe94f44525b google::LogMessage::SendToLog()
@ 0x7fe94f43febf google::LogMessage::Flush()
@ 0x7fe94f4406ef google::LogMessageFatal::~LogMessageFatal()
@ 0x55918ac33adc lwis::Device::Setup()
@ 0x55918ac35cab lwis::Server::Setup()
@ 0x55918ab91a00 doInference()
@ 0x55918ab8f2b0 main
@ 0x7fe93d00e083 __libc_start_main
@ 0x55918ab8f83e _start
Aborted (core dumped)
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/main.py", line 231, in
main(main_args, DETECTED_SYSTEM)
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/main.py", line 144, in main
dispatch_action(main_args, config_dict, workload_setting)
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/main.py", line 202, in dispatch_action
handler.run()
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/actionhandler/base.py", line 82, in run
self.handle_failure()
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/actionhandler/run_harness.py", line 193, in handle_failure
raise RuntimeError("Run harness failed!")
RuntimeError: Run harness failed!
Traceback (most recent call last):
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/actionhandler/run_harness.py", line 161, in handle
result_data = self.harness.run_harness(flag_dict=self.harness_flag_dict, skip_generate_measurements=True)
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/common/harness.py", line 352, in run_harness
output = run_command(self.construct_terminal_command(argstr), get_output=True, custom_env=self.env_vars)
File "/root/CM/repos/local/cache/9550c8ab90084238/repo/closed/NVIDIA/code/common/init.py", line 67, in run_command
raise subprocess.CalledProcessError(ret, cmd)
subprocess.CalledProcessError: Command './build/bin/harness_default --plugins="build/plugins/NMSOptPlugin/libnmsoptplugin.so,build/plugins/retinanetConcatPlugin/libretinanetconcatplugin.so" --logfile_outdir="/root/CM/repos/local/cache/aba6a14ff6834703/test_results/e1ef67ab5fc2-nvidia_original-gpu-tensorrt-vdefault-default_config/retinanet/offline/performance/run_1" --logfile_prefix="mlperf_log
" --performance_sample_count=64 --test_mode="PerformanceOnly" --gpu_batch_size=2 --map_path="data_maps/open-images-v6-mlperf/val_map.txt" --mlperf_conf_path="/root/CM/repos/local/cache/29bd0ac3d7ee432a/inference/mlperf.conf" --tensor_path="build/preprocessed_data/open-images-v6-mlperf/validation/Retinanet/int8_linear" --use_graphs=false --user_conf_path="/root/CM/repos/mlcommons@cm4mlops/script/generate-mlperf-inference-user-conf/tmp/3f6123665f4141bdbc1d204bede47ce2.conf" --gpu_engines="./build/engines/e1ef67ab5fc2/retinanet/Offline/retinanet-Offline-gpu-b2-int8.lwis_k_99_MaxP.plan" --max_dlas=0 --scenario Offline --model retinanet --response_postprocess openimageeffnms' returned non-zero exit status 134.
make: *** [Makefile:45: run_harness] Error 1
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/benchmark-program/customize.py
INFO:root: * cm run script "save mlperf inference state"
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/save-mlperf-inference-implementation-state/customize.py
INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f
INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/app-mlperf-inference/run.sh from tmp-run.sh
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/app-mlperf-inference/customize.py
INFO:root:* cm run script "get mlperf sut description"
INFO:root: * cm run script "detect os"
INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f
INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py
INFO:root: * cm run script "detect cpu"
INFO:root: * cm run script "detect os"
INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f
INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/detect-os/run.sh from tmp-run.sh
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/detect-os/customize.py
INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f
INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/detect-cpu/run.sh from tmp-run.sh
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/detect-cpu/customize.py
INFO:root: * cm run script "get python3"
INFO:root: ! load /root/CM/repos/local/cache/7ead820172a540e6/cm-cached-state.json
INFO:root:Path to Python: /usr/bin/python3
INFO:root:Python version: 3.8.10
INFO:root: * cm run script "get compiler"
INFO:root: ! load /root/CM/repos/local/cache/30d4c7085bc24d5c/cm-cached-state.json
INFO:root: * cm run script "get cuda-devices _with-pycuda"
INFO:root: * cm run script "get cuda _toolkit"
INFO:root: ! load /root/CM/repos/local/cache/137abe42c97c44f6/cm-cached-state.json
INFO:root:ENV[CM_CUDA_PATH_LIB_CUDNN_EXISTS]: no
INFO:root:ENV[CM_CUDA_VERSION]: 12.2
INFO:root:ENV[CM_CUDA_VERSION_STRING]: cu122
INFO:root:ENV[CM_NVCC_BIN_WITH_PATH]: /usr/local/cuda/bin/nvcc
INFO:root:ENV[CUDA_HOME]: /usr/local/cuda
INFO:root: * cm run script "get python3"
INFO:root: ! load /root/CM/repos/local/cache/7ead820172a540e6/cm-cached-state.json
INFO:root:Path to Python: /usr/bin/python3
INFO:root:Python version: 3.8.10
INFO:root: * cm run script "get generic-python-lib _package.pycuda"
INFO:root: ! load /root/CM/repos/local/cache/457a72dc0cd941fc/cm-cached-state.json
INFO:root: ! cd /root/CM/repos/local/cache/722613bcce9a4b2f
INFO:root: ! call /root/CM/repos/mlcommons@cm4mlops/script/get-cuda-devices/detect.sh from tmp-run.sh
GPU 0:
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/get-cuda-devices/customize.py
INFO:root: * cm run script "get generic-python-lib _package.dmiparser"
INFO:root: ! load /root/CM/repos/local/cache/525f77d4ad5a4f72/cm-cached-state.json
INFO:root: * cm run script "get cache dir _name.mlperf-inference-sut-descriptions"
INFO:root: ! load /root/CM/repos/local/cache/3d93e38d01d7494d/cm-cached-state.json
Generating SUT description file for e1ef67ab5fc2-tensorrt
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/get-mlperf-inference-sut-description/customize.py
INFO:root: ! call "postprocess" from /root/CM/repos/mlcommons@cm4mlops/script/run-mlperf-inference-app/customize.py

@arjunsuresh
Copy link
Contributor

Hi @stbailey001 Does a retry help here with --docker_cache=no? This is an L4 GPU right?

@stbailey001
Copy link
Author

stbailey001 commented Oct 4, 2024

reran with --docker_cache=no still fails with the core. Yes this is an L4.

@Oseltamivir
Copy link
Contributor

@stbailey001 hi, I am not sure if you are still facing this issue.

TensorRT uses optimization profiles to handle dynamic shapes. Each profile defines the minimum, optimal, and maximum dimensions for input tensors.
When multiple optimization profiles are created with the same shape, TRT internally recognizes them as duplicates because they don't provide any additional shape flexibility.

Execution Contexts:
Each IExecutionContext is tied to a specific optimization profile.
TRT enforces that each optimization profile can only be associated with one IExecutionContext at a time.
If multiple profiles are added with identical shapes and associated with different execution contexts, TRT identifies them as the same profile (since their shapes are identical). This leads to a conflict where multiple contexts are trying to use the same profile.

There's an error here from TRT referencing this error.

Nvidia's implementation of the class EngineRunner creates this profile. It is likely some edge for their code. But I think it might be caused due to zombie or orphaned processes or a misbuilt .plan file.

If you are still facing this error, maybe try deleting the cache and rebuilding.

Hope this helps

@nv-ananjappa
Copy link
Contributor

@stbailey001 L4 is not one of the officially supported GPUs in the recent v4.0/v4.1 submissions from NVIDIA. I don't see why RetinaNet couldn't run on L4, so you might be able to debug and fix this bug if you're interested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants