You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"""Get multimodal token positions using IDs > vocab_size and known lengths.
538
+
"""Get starting positions of contiguous multimodal token chunks using known lengths.
539
+
540
+
This function finds multimodal tokens (with IDs > vocab_size or matching mm_token_ids)
541
+
and uses the provided lengths in num_mm_tokens to identify where each contiguous chunk starts.
542
+
Each chunk in num_mm_tokens is assumed to be a contiguous block of multimodal tokens for each multimodal item, and may include special tokens (e.g., image_begin, image_end, image_break) within the chunk.
532
543
533
-
This function finds multimodal tokens (with IDs > vocab_size) and uses the
534
-
provided lengths in num_mm_tokens to identify where each chunk starts.
535
-
This works even when there are no gaps between different image sequences
536
-
(e.g., when all images use the same token IDs).
537
-
Note at least one of vocab_size or mm_token_ids must be provided. If mm_token_ids is provided, vocab_size is ignored.
544
+
Note: at least one of vocab_size or mm_token_ids must be provided. If mm_token_ids
545
+
is provided, vocab_size is ignored.
538
546
539
547
Args:
540
548
input_ids: Token sequence (tensor, list, or numpy array)
541
-
num_mm_tokens: List of lengths for each multimodal token chunk
542
-
vocab_size: Size of the model's vocabulary
543
-
mm_token_ids: Possible token ids for multimodal tokens
549
+
num_mm_tokens: List of contiguous chunk lengths for each multimodal item
550
+
vocab_size: Size of the model's vocabulary (used to identify tokens > vocab_size)
551
+
mm_token_ids: Specific token IDs that represent multimodal tokens
544
552
545
553
Returns:
546
-
List of starting positions for each multimodal token chunk
554
+
List of starting positions for each contiguous multimodal token chunk
f"Prompt output: {prompt_output}\nExpected keywords: {prompt_keywords}\n Matched keywords: {matches}\n Observed match ratio {obs_match_ratio} given threshold {match_ratio}"
2597
+
)
2598
+
assertobs_match_ratio>=match_ratio, f"Incorrect output!\nGenerated \"{prompt_output}\"\nExpected keywords \"{prompt_keywords}\"\n Matched keywords: {matches}\n Observed match ratio {obs_match_ratio} below threshold {match_ratio}"
2599
+
# TODO: Setting max_batch_size=1 and repeating the same request helps test KV cache reuse indirectly,
2600
+
# but does not directly measure the KV cache hit rate. For a more direct test, we would need to enable
2601
+
# return_perf_metrics=True, which is not currently supported by the quickstart example CLI.
0 commit comments