-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function to set end_id and apply chat template for GAP Triton In-Process #183
base: main
Are you sure you want to change the base?
Changes from all commits
6122f9d
bb87979
bd79493
df0be1c
f652271
e738910
144f165
1777fa1
d6be758
d067b04
c590a6e
79f869d
f1d3708
ca20c30
36338f9
d2b2b63
bf14733
22e2322
4410e48
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -142,3 +142,6 @@ class InputsConfig: | |
|
||
# Seed used to generate random values | ||
random_seed: int = DEFAULT_RANDOM_SEED | ||
|
||
# whether to apply chat template in triton converter | ||
apply_chat_template: bool = False | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Update the comment to make this generic. You'd also want apply_chat_template to be a string if we're making it generic based on how other endpoints use chat templating. Newline after this. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -571,6 +571,13 @@ def _add_image_input_args(parser): | |
"If format is not selected, format of generated image is selected at random", | ||
) | ||
|
||
input_group.add_argument( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with Hyunjae's point below. Also, the image input arg function does not seem like the right place for this. |
||
"--triton-converter-apply-chat-template", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
But I would go for adding it in a more generic way to support chat_template for any use cases, not just for triton in-process. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please correct me if I'm wrong. Is chat template only necessary to benchmark raw engines? Benchmarking through api endpoints does not need to add chat template in GAP, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, you're right. I guess what I wanted to say was we want to make it more generic so that the option is not limited to only triton in-process although the current use case is just triton in-process. This way we have a stronger reason to add this to our cli options. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Got it. Does GAP have any other route to benchmark raw engines that might need chat templates? I am not familiar with the codebase and it would be great if someone from your team has some BW to help. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The link from Hyunjae shows how chat templates are used with some APIs. The only endpoint that I know supports templates right now is the chat endpoint. The team is bandwidth-constrained at the moment, though that could be a good addition. |
||
action="store_true", | ||
required=False, | ||
help="If specified, the input to trtllm engines in triton server will " | ||
"be wrapped with chat template." | ||
) | ||
|
||
def _add_profile_args(parser): | ||
profile_group = parser.add_argument_group("Profiling") | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -282,8 +282,11 @@ def _get_tensorrtllm_engine_token_counts( | |
for r in res_outputs: | ||
if isinstance(r["output_ids"], list): | ||
token_ids += r["output_ids"] | ||
else: | ||
elif isinstance(r["output_ids"], int): | ||
token_ids.append(r["output_ids"]) | ||
else: | ||
# for the empty first/last responses | ||
token_ids.append(0) | ||
Comment on lines
+285
to
+289
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @matthewkotila added a bug fix for empty first/last response (it will be and empty string and cause error). |
||
return token_ids, len(token_ids) | ||
|
||
def _get_triton_output_tokens(self, res_outputs: List[Dict]) -> List[str]: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -68,6 +68,9 @@ def __call__(self, text, **kwargs) -> "BatchEncoding": | |
def encode(self, text, **kwargs) -> List[int]: | ||
self._encode_args.update(kwargs) | ||
return self._tokenizer.encode(text, **self._encode_args) | ||
|
||
def apply_chat_template(self, text) -> List[int]: | ||
return self._tokenizer.encode(self._tokenizer.apply_chat_template([{"role": "user", "content": text}], tokenize=False), add_special_tokens=False) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can't have a TRT-LLM-specific chat template in the tokenizer class. We should have the user provide the template, then the tokenizer applies it if has one. If we're making this endpoint-specific, then it should exist solely in the converter as an extra arg. As long as it doesn't affect metrics, which I think it shouldn't for TRT-LLM in-process. (CC: @nv-hwoo) |
||
|
||
def decode(self, token_ids, **kwargs) -> str: | ||
self._decode_args.update(kwargs) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably shorten this key, if you wanted. e.g. "set_end_id"