-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support profile data parsing for tensorrtllm engine service kind #33
Conversation
nv-hwoo
commented
Aug 8, 2024
- Support parsing profile export JSON from tensorrtllm engine through c api
- add tests
- small refactor for test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great testing architecture!
Left a question. Since we may be removing service-kind, it may matter less, but it feels like the C API service-kind is being used exclusively with the direct TRT-LLM engine in mind (e.g. skipping the tokenization, getting engine token counts in a TRT-LLM-specific way). That feels like it could cause issues in the future. Should we rename the service-kind to make it clear that it's not just the C API but specifically for TRT-LLM? Customers may think this is for the general Triton C API, like what PA supports.
@dyastremsky Note that the service kind used in data parser is different from |
Ah, I see. Thanks! That's a bit confusing. It might be worth spending a little bit of time to see if there's a workaround or way to improve the code (maybe Matt or Elias have ideas). If not, we can keep this as is, but there's a code smell here IMO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spectacular work, Hyunjae! 🚀
* support parsing tensorrtllm engine profile response * add test * refactor the test * update types and names * fix pre-commit * run PA with triton c api * more clean up on the tests * fix codeql * address feedback
* Add tensorrtllm_engine option to service-kind and update testing (#700) (#762) * Add tensorrtllm_engine option to service-kind and update testing * Add output format check for tensorrtllm_engine Co-authored-by: Elias Bermudez <[email protected]> * Support input payload generation for tensorrtllm engine (#767) * Add functionality for async requests and output retrieval with Triton C API (#25) * Support 1-d array data in profile exporter (#28) * support array of data in profile exporter * add some tests * run formatting * fix pre-commit * remove duplicate argparser arguments * Fix Triton C API mode missing infer requested output datatype bug --------- Co-authored-by: Matthew Kotila <[email protected]> * Support profile data parsing for tensorrtllm engine service kind (#33) * support parsing tensorrtllm engine profile response * add test * refactor the test * update types and names * fix pre-commit * run PA with triton c api * more clean up on the tests * fix codeql * address feedback * Add functionality to continue benchmarking in Triton C API mode if server logging support is disabled (#34) --------- Co-authored-by: Hyunjae Woo <[email protected]> Co-authored-by: Elias Bermudez <[email protected]>
* Add tensorrtllm_engine option to service-kind and update testing (#700) (#762) * Add tensorrtllm_engine option to service-kind and update testing * Add output format check for tensorrtllm_engine Co-authored-by: Elias Bermudez <[email protected]> * Support input payload generation for tensorrtllm engine (#767) * Add functionality for async requests and output retrieval with Triton C API (#25) * Support 1-d array data in profile exporter (#28) * support array of data in profile exporter * add some tests * run formatting * fix pre-commit * remove duplicate argparser arguments * Fix Triton C API mode missing infer requested output datatype bug --------- Co-authored-by: Matthew Kotila <[email protected]> * Support profile data parsing for tensorrtllm engine service kind (#33) * support parsing tensorrtllm engine profile response * add test * refactor the test * update types and names * fix pre-commit * run PA with triton c api * more clean up on the tests * fix codeql * address feedback * Add functionality to continue benchmarking in Triton C API mode if server logging support is disabled (#34) --------- Co-authored-by: Hyunjae Woo <[email protected]> Co-authored-by: Elias Bermudez <[email protected]>