Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support 1-d array data in profile exporter #28

Merged
merged 6 commits into from
Aug 8, 2024

Conversation

nv-hwoo
Copy link
Contributor

@nv-hwoo nv-hwoo commented Aug 6, 2024

  1. Support 1-d array data in profile exporter so that when the input/output involves more than one data value (or array of values), then it captures all the data, rather than just the first element. This is an incremental step towards supporting full N-dimensional tensor in profile exporter. Currently, we don't need to support N-dimensional tensors because we are mostly interested in the text string or token ids (which is just a 1-d array). Additionally, N-dimensional tensor support requires more changes to PA since response outputs doesn't store shape information of the tensors returned from the model so we need to retrieve that information somewhere.
  2. Support other data types such as uint, fp, and etc.

Example:

When given trtllm engine the following input:

{
    "token_ids": [1, 2, 3, 4, 5, 6, 7, 8],
    "input_length": [8],
    "request_output_len": [10],
    ...
}

what we got from PA before the change is

{
    "request_inputs": {
        "token_ids": 1,  // missing remaining values
        "input_length": 8,
        "request_output_len": 10,
    },
    ...
}

After the fix, we get the full array of token ids (same for any other fields in the profile export json file):

{
    "request_inputs": {
        "token_ids": [1, 2, 3, 4, 5, 6, 7, 8],
        "input_length": 8,
        "request_output_len": 10,
    },
    ...
}

@nv-hwoo nv-hwoo force-pushed the hwoo-profile-export-array branch from 549fece to fa7fdd2 Compare August 6, 2024 19:37
@nv-hwoo nv-hwoo force-pushed the hwoo-profile-export-array branch from fa7fdd2 to c0de3e8 Compare August 6, 2024 22:23
@nv-hwoo nv-hwoo force-pushed the hwoo-profile-export-array branch from faa23f4 to c7b1642 Compare August 7, 2024 18:01
@matthewkotila matthewkotila force-pushed the hwoo-profile-export-array branch from b32067d to f787972 Compare August 7, 2024 19:03
@matthewkotila matthewkotila force-pushed the hwoo-profile-export-array branch from f787972 to 9dcc8b7 Compare August 7, 2024 19:13
@matthewkotila matthewkotila force-pushed the hwoo-profile-export-array branch from 9dcc8b7 to faf762a Compare August 7, 2024 21:48
@matthewkotila matthewkotila force-pushed the hwoo-profile-export-array branch from faf762a to 5df151a Compare August 7, 2024 22:03
@matthewkotila matthewkotila force-pushed the hwoo-profile-export-array branch from 5df151a to 684958f Compare August 7, 2024 22:17
@matthewkotila matthewkotila force-pushed the hwoo-profile-export-array branch from 684958f to 79e3d0e Compare August 7, 2024 23:23
@nv-hwoo nv-hwoo merged commit e258b28 into tensorrtllm-engine Aug 8, 2024
7 checks passed
@nv-hwoo nv-hwoo deleted the hwoo-profile-export-array branch August 8, 2024 16:38
matthewkotila added a commit that referenced this pull request Aug 9, 2024
* support array of data in profile exporter

* add some tests

* run formatting

* fix pre-commit

* remove duplicate argparser arguments

* Fix Triton C API mode missing infer requested output datatype bug

---------

Co-authored-by: Matthew Kotila <[email protected]>
matthewkotila added a commit that referenced this pull request Aug 9, 2024
* Add tensorrtllm_engine option to service-kind and update testing (#700) (#762)

* Add tensorrtllm_engine option to service-kind and update testing

* Add output format check for tensorrtllm_engine

Co-authored-by: Elias Bermudez <[email protected]>

* Support input payload generation for tensorrtllm engine (#767)

* Add functionality for async requests and output retrieval with Triton C API (#25)

* Support 1-d array data in profile exporter (#28)

* support array of data in profile exporter

* add some tests

* run formatting

* fix pre-commit

* remove duplicate argparser arguments

* Fix Triton C API mode missing infer requested output datatype bug

---------

Co-authored-by: Matthew Kotila <[email protected]>

* Support profile data parsing for tensorrtllm engine service kind (#33)

* support parsing tensorrtllm engine profile response

* add test

* refactor the test

* update types and names

* fix pre-commit

* run PA with triton c api

* more clean up on the tests

* fix codeql

* address feedback

* Add functionality to continue benchmarking in Triton C API mode if server logging support is disabled (#34)

---------

Co-authored-by: Hyunjae Woo <[email protected]>
Co-authored-by: Elias Bermudez <[email protected]>
lkomali pushed a commit that referenced this pull request Aug 15, 2024
* Add tensorrtllm_engine option to service-kind and update testing (#700) (#762)

* Add tensorrtllm_engine option to service-kind and update testing

* Add output format check for tensorrtllm_engine

Co-authored-by: Elias Bermudez <[email protected]>

* Support input payload generation for tensorrtllm engine (#767)

* Add functionality for async requests and output retrieval with Triton C API (#25)

* Support 1-d array data in profile exporter (#28)

* support array of data in profile exporter

* add some tests

* run formatting

* fix pre-commit

* remove duplicate argparser arguments

* Fix Triton C API mode missing infer requested output datatype bug

---------

Co-authored-by: Matthew Kotila <[email protected]>

* Support profile data parsing for tensorrtllm engine service kind (#33)

* support parsing tensorrtllm engine profile response

* add test

* refactor the test

* update types and names

* fix pre-commit

* run PA with triton c api

* more clean up on the tests

* fix codeql

* address feedback

* Add functionality to continue benchmarking in Triton C API mode if server logging support is disabled (#34)

---------

Co-authored-by: Hyunjae Woo <[email protected]>
Co-authored-by: Elias Bermudez <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants