TensorRT-LLM in-process benchmarking support #35

matthewkotila · 2024-08-09T23:15:52Z

All commits in PR have previously been reviewed. This is the epic branch PR to merge it into main.

Draft until final testing has completed.

…0) (#762) * Add tensorrtllm_engine option to service-kind and update testing * Add output format check for tensorrtllm_engine Co-authored-by: Elias Bermudez <[email protected]>

… C API (#25)

* support array of data in profile exporter * add some tests * run formatting * fix pre-commit * remove duplicate argparser arguments * Fix Triton C API mode missing infer requested output datatype bug --------- Co-authored-by: Matthew Kotila <[email protected]>

* support parsing tensorrtllm engine profile response * add test * refactor the test * update types and names * fix pre-commit * run PA with triton c api * more clean up on the tests * fix codeql * address feedback

…rver logging support is disabled (#34)

* Add tensorrtllm_engine option to service-kind and update testing (#700) (#762) * Add tensorrtllm_engine option to service-kind and update testing * Add output format check for tensorrtllm_engine Co-authored-by: Elias Bermudez <[email protected]> * Support input payload generation for tensorrtllm engine (#767) * Add functionality for async requests and output retrieval with Triton C API (#25) * Support 1-d array data in profile exporter (#28) * support array of data in profile exporter * add some tests * run formatting * fix pre-commit * remove duplicate argparser arguments * Fix Triton C API mode missing infer requested output datatype bug --------- Co-authored-by: Matthew Kotila <[email protected]> * Support profile data parsing for tensorrtllm engine service kind (#33) * support parsing tensorrtllm engine profile response * add test * refactor the test * update types and names * fix pre-commit * run PA with triton c api * more clean up on the tests * fix codeql * address feedback * Add functionality to continue benchmarking in Triton C API mode if server logging support is disabled (#34) --------- Co-authored-by: Hyunjae Woo <[email protected]> Co-authored-by: Elias Bermudez <[email protected]>

…mode, because async support for triton_c_api was added in triton-inference-server/perf_analyzer#35

nv-hwoo and others added 6 commits August 9, 2024 10:42

Add tensorrtllm_engine option to service-kind and update testing (#70…

c679860

…0) (#762) * Add tensorrtllm_engine option to service-kind and update testing * Add output format check for tensorrtllm_engine Co-authored-by: Elias Bermudez <[email protected]>

Support input payload generation for tensorrtllm engine (#767)

d601fb4

Add functionality for async requests and output retrieval with Triton…

5fe3bb9

… C API (#25)

Support profile data parsing for tensorrtllm engine service kind (#33)

e6b9cc1

* support parsing tensorrtllm engine profile response * add test * refactor the test * update types and names * fix pre-commit * run PA with triton c api * more clean up on the tests * fix codeql * address feedback

Add functionality to continue benchmarking in Triton C API mode if se…

f13cb0c

…rver logging support is disabled (#34)

matthewkotila requested a review from nv-hwoo August 9, 2024 23:15

matthewkotila temporarily deployed to GITLAB August 9, 2024 23:15 — with GitHub Actions Inactive

matthewkotila temporarily deployed to GITLAB August 9, 2024 23:16 — with GitHub Actions Inactive

matthewkotila marked this pull request as ready for review August 9, 2024 23:29

nv-hwoo approved these changes Aug 9, 2024

View reviewed changes

matthewkotila merged commit e1455e0 into main Aug 9, 2024
7 checks passed

matthewkotila deleted the tensorrtllm-engine branch August 9, 2024 23:42

rmccorm4 added a commit to triton-inference-server/server that referenced this pull request Jan 16, 2025

Remove negative test that asserts triton_c_api doesn't support async …

e3d785b

…mode, because async support for triton_c_api was added in triton-inference-server/perf_analyzer#35

rmccorm4 mentioned this pull request Jan 16, 2025

test: Stabilize L0_perf_analyzer_capi test consistency triton-inference-server/server#7946

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT-LLM in-process benchmarking support #35

TensorRT-LLM in-process benchmarking support #35

matthewkotila commented Aug 9, 2024

TensorRT-LLM in-process benchmarking support #35

TensorRT-LLM in-process benchmarking support #35

Conversation

matthewkotila commented Aug 9, 2024