Add function to set end_id and apply chat template for GAP Triton In-Process #183

tedzhouhk · 2024-11-15T21:02:26Z

Add two functions for GAP Triton In-Process:

Pass EOS token as end_id so that output can stop at EOS.
Apply chat template so that OSL is accurate.
A small bug fix for empty first/last response in triton engine.

add feature to set end_id for trtllm engine input convert

nv-hwoo · 2024-11-15T21:23:06Z

genai-perf/genai_perf/inputs/converters/tensorrtllm_engine_converter.py

+        if config.set_end_id:
+            payload["end_id"] = [config.tokenizer._tokenizer.eos_token_id]


I am a bit concerned about adding a new CLI option for this specific use cases. Since we try to avoid adding too much options to the tool, and this can be achieved through the --extra-inputs <name>:<value> as well.
cc @dyastremsky

Yeah if this can be done via --extra-inputs, let's skip this change.

It is moved to --extra-inputs. however, code change is still necessary unless use directly provide eos token id (instead of fetching from tokenizer). Please let me know if current approach looks good. Thanks.

nv-hwoo · 2024-11-15T21:25:28Z

genai-perf/genai_perf/parser.py

+    )
+
+    input_group.add_argument(
+        "--triton-converter-apply-chat-template",


apply_chat_template I think could be a useful cli option to add: https://huggingface.co/docs/transformers/main/en/chat_templating

But I would go for adding it in a more generic way to support chat_template for any use cases, not just for triton in-process.

Please correct me if I'm wrong. Is chat template only necessary to benchmark raw engines? Benchmarking through api endpoints does not need to add chat template in GAP, right?

Yes, you're right. I guess what I wanted to say was we want to make it more generic so that the option is not limited to only triton in-process although the current use case is just triton in-process. This way we have a stronger reason to add this to our cli options.

Got it. Does GAP have any other route to benchmark raw engines that might need chat templates? I am not familiar with the codebase and it would be great if someone from your team has some BW to help.

The link from Hyunjae shows how chat templates are used with some APIs. The only endpoint that I know supports templates right now is the chat endpoint.

The team is bandwidth-constrained at the moment, though that could be a good addition.

tedzhouhk · 2024-11-18T19:18:47Z

genai-perf/genai_perf/profile_data_parser/llm_profile_data_parser.py

+            elif isinstance(r["output_ids"], int):
                token_ids.append(r["output_ids"])
+            else:
+                # for the empty first/last responses
+                token_ids.append(0)


@matthewkotila added a bug fix for empty first/last response (it will be and empty string and cause error).

dyastremsky

Good start to this PR. I like the idea of making the chat template generic for use with any/all endpoints.

We should also add unit testing to support these changes.

dyastremsky · 2024-11-18T21:14:14Z

genai-perf/genai_perf/parser.py

@@ -571,6 +571,13 @@ def _add_image_input_args(parser):
        "If format is not selected, format of generated image is selected at random",
    )

+    input_group.add_argument(


I agree with Hyunjae's point below. Also, the image input arg function does not seem like the right place for this.

dyastremsky · 2024-11-18T21:18:57Z

genai-perf/genai_perf/parser.py

+    )
+
+    input_group.add_argument(
+        "--triton-converter-apply-chat-template",


The link from Hyunjae shows how chat templates are used with some APIs. The only endpoint that I know supports templates right now is the chat endpoint.

The team is bandwidth-constrained at the moment, though that could be a good addition.

dyastremsky · 2024-11-18T21:19:51Z

genai-perf/genai_perf/inputs/inputs_config.py

+    set_end_id: bool = False
+
+    # whether to apply chat template in triton converter
+    apply_chat_template: bool = False


Update the comment to make this generic. You'd also want apply_chat_template to be a string if we're making it generic based on how other endpoints use chat templating.

Newline after this.

dyastremsky · 2024-11-18T21:20:25Z

genai-perf/genai_perf/inputs/inputs_config.py

@@ -142,3 +142,9 @@ class InputsConfig:

    # Seed used to generate random values
    random_seed: int = DEFAULT_RANDOM_SEED
+
+    # whether to set end_id in triton converter
+    set_end_id: bool = False


Is this still necessary? We don't want endpoint-specific fields in inputs_config.py.

good catch, i'll remove this

dyastremsky · 2024-11-18T21:24:59Z

genai-perf/genai_perf/tokenizer.py

@@ -68,6 +68,9 @@ def __call__(self, text, **kwargs) -> "BatchEncoding":
    def encode(self, text, **kwargs) -> List[int]:
        self._encode_args.update(kwargs)
        return self._tokenizer.encode(text, **self._encode_args)
+
+    def apply_chat_template(self, text) -> List[int]:
+        return self._tokenizer.encode(self._tokenizer.apply_chat_template([{"role": "user", "content": text}], tokenize=False), add_special_tokens=False)


We can't have a TRT-LLM-specific chat template in the tokenizer class. We should have the user provide the template, then the tokenizer applies it if has one.

If we're making this endpoint-specific, then it should exist solely in the converter as an extra arg. As long as it doesn't affect metrics, which I think it shouldn't for TRT-LLM in-process. (CC: @nv-hwoo)

dyastremsky · 2024-11-18T21:27:58Z

genai-perf/genai_perf/inputs/converters/tensorrtllm_engine_converter.py

@@ -82,4 +85,7 @@ def _add_request_params(self, payload: Dict, config: InputsConfig) -> None:
                payload["min_length"] = [num_tokens]

        for key, value in config.extra_inputs.items():
-            payload[key] = [value]
+            if key == "triton_converter_set_end_id" and value:


You could probably shorten this key, if you wanted. e.g. "set_end_id"

hongkuan and others added 8 commits November 14, 2024 16:47

add feature to set end_id for trtllm engine input convert

6122f9d

Merge pull request #1 from tedzhouhk/hzhou/triton_eos

bb87979

add feature to set end_id for trtllm engine input convert

move to input group

bd79493

add to skip_arg

df0be1c

typo

f652271

pass end in in list

e738910

add chat template

144f165

typo

1777fa1

nv-hwoo reviewed Nov 15, 2024

View reviewed changes

hongkuan added 9 commits November 15, 2024 15:17

correct chat template

d6be758

typo

d067b04

remove triton_converter_set_end_id from cli arg

c590a6e

typo

79f869d

add back to ignore list

f1d3708

fix

ca20c30

typo

36338f9

typo

d2b2b63

fix measurement for empty first/last response

bf14733

tedzhouhk commented Nov 18, 2024

View reviewed changes

dyastremsky reviewed Nov 18, 2024

View reviewed changes

hongkuan and others added 2 commits November 18, 2024 13:35

remove unused config in InputsConfig

22e2322

Merge branch 'main' into main

4410e48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add function to set end_id and apply chat template for GAP Triton In-Process #183

Add function to set end_id and apply chat template for GAP Triton In-Process #183

tedzhouhk commented Nov 15, 2024 •

edited

Loading

nv-hwoo Nov 15, 2024

matthewkotila Nov 15, 2024

tedzhouhk Nov 16, 2024

nv-hwoo Nov 15, 2024

tedzhouhk Nov 15, 2024

nv-hwoo Nov 15, 2024

tedzhouhk Nov 15, 2024

dyastremsky Nov 18, 2024

tedzhouhk Nov 18, 2024

dyastremsky left a comment

dyastremsky Nov 18, 2024

dyastremsky Nov 18, 2024

dyastremsky Nov 18, 2024

dyastremsky Nov 18, 2024

tedzhouhk Nov 18, 2024

dyastremsky Nov 18, 2024

dyastremsky Nov 18, 2024

		if config.set_end_id:
		payload["end_id"] = [config.tokenizer._tokenizer.eos_token_id]

Add function to set end_id and apply chat template for GAP Triton In-Process #183

Are you sure you want to change the base?

Add function to set end_id and apply chat template for GAP Triton In-Process #183

Conversation

tedzhouhk commented Nov 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dyastremsky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tedzhouhk commented Nov 15, 2024 •

edited

Loading