Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix]: Make chat content text allow type content #9358

Merged
merged 53 commits into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
bde4065
[Bugix]: Make chat content text allow type content
vrdn-23 Oct 15, 2024
3bf919c
Add test to verify content is parsed as expected
vrdn-23 Oct 15, 2024
66ab303
Fix formatting
vrdn-23 Oct 15, 2024
aed37f6
Fix test to actually test the fix
vrdn-23 Oct 18, 2024
a194b32
Rewrite logic to fix failing test
vrdn-23 Oct 18, 2024
9a08708
Merge branch 'vllm-project:main' into vrdn/chat-content-utils
vrdn-23 Oct 18, 2024
a112a1a
Another attempt to making this work
vrdn-23 Oct 18, 2024
0fee8d7
Remove the offending tests
vrdn-23 Oct 18, 2024
4011ed1
Add cli args to switch between types
vrdn-23 Oct 18, 2024
80ae489
Minor fix
vrdn-23 Oct 18, 2024
a67e03f
Fix tests
vrdn-23 Oct 18, 2024
357376b
Merge branch 'vllm-project:main' into vrdn/chat-content-utils
vrdn-23 Oct 18, 2024
0b95a0b
Fix formatting
vrdn-23 Oct 18, 2024
d89e8c0
Revert chat changes
vrdn-23 Oct 18, 2024
be94fc5
Add missing new line
vrdn-23 Oct 18, 2024
ff7965a
Minor nits
vrdn-23 Oct 18, 2024
22489cf
Minor nits again
vrdn-23 Oct 18, 2024
f8d2cba
Standardize name
vrdn-23 Oct 18, 2024
54532f2
Remove unnecessary variable
vrdn-23 Oct 18, 2024
ea7274d
Actually remove unnecessary variable
vrdn-23 Oct 18, 2024
f3608be
Make variable name simpler
vrdn-23 Oct 18, 2024
79a22ba
Fix help doc
vrdn-23 Oct 18, 2024
c7e5371
Fix formatting
vrdn-23 Oct 18, 2024
80d45d5
Fix default value in config
vrdn-23 Oct 18, 2024
89dd84b
Fix failing test
vrdn-23 Oct 18, 2024
faafc31
Fix failing test
vrdn-23 Oct 18, 2024
ef74a9c
Fix failing test again
vrdn-23 Oct 18, 2024
b7c90c2
Merge branch 'vllm-project:main' into vrdn/chat-content-utils
vrdn-23 Oct 18, 2024
1eca1f5
Fix mypy error
vrdn-23 Oct 18, 2024
2696c75
Merge remote-tracking branch 'refs/remotes/origin/vrdn/chat-content-u…
vrdn-23 Oct 18, 2024
2fe18bd
Fix mypy error by ignoring
vrdn-23 Oct 18, 2024
6311517
Put ignore in the right place
vrdn-23 Oct 18, 2024
f3f3887
Fix formatting
vrdn-23 Oct 18, 2024
a08b342
Remove stupid typo
vrdn-23 Oct 18, 2024
a104da3
Merge branch 'vllm-project:main' into vrdn/chat-content-utils
vrdn-23 Oct 18, 2024
607b5ef
Merge branch 'vllm-project:main' into vrdn/chat-content-utils
vrdn-23 Oct 21, 2024
3c5c60a
Merge branch 'vllm-project:main' into vrdn/chat-content-utils
vrdn-23 Oct 22, 2024
75ed3e6
Fix mypy and tests
vrdn-23 Oct 22, 2024
3e55da8
Fix mypy
vrdn-23 Oct 22, 2024
7a01d53
Fix mypy again
vrdn-23 Oct 22, 2024
bef9a2d
Fix formatting
vrdn-23 Oct 22, 2024
c0fc5c9
Fix formatting again
vrdn-23 Oct 22, 2024
50ba0ce
Add docs and str generator
vrdn-23 Oct 23, 2024
62e35bc
Add bit more docs
vrdn-23 Oct 23, 2024
e318c85
Merge branch 'vllm-project:main' into vrdn/chat-content-utils
vrdn-23 Oct 23, 2024
e2456b9
Merge branch 'main' into vrdn/chat-content-utils
vrdn-23 Oct 23, 2024
0130c74
Fix formatting
vrdn-23 Oct 23, 2024
cedd014
Merge branch 'vllm-project:main' into vrdn/chat-content-utils
vrdn-23 Oct 24, 2024
f40fbf9
Simplify check with content parser
vrdn-23 Oct 24, 2024
1a772c7
Fix ruff
vrdn-23 Oct 24, 2024
fd61ada
Rename variable to be more appropriate
vrdn-23 Oct 24, 2024
a04d76d
Fix missing part
vrdn-23 Oct 24, 2024
1bb9faa
Fix formatting
vrdn-23 Oct 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions docs/source/serving/openai_compatible_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,23 @@ vllm serve <model> --chat-template ./path-to-chat-template.jinja
vLLM community provides a set of chat templates for popular models. You can find them in the examples
directory [here](https://github.com/vllm-project/vllm/tree/main/examples/)

With the inclusion of multi-modal chat APIs, the OpenAI spec now accepts chat messages in a new format which specifies
both a `type` and a `text` field. An example is provided below:
```python
completion = client.chat.completions.create(
model="NousResearch/Meta-Llama-3-8B-Instruct",
messages=[
{"role": "user", "content": [{"type": "text", "text": "Classify this sentiment: vLLM is wonderful!"}]}
]
)
```
Most chat templates for LLMs expect the `content` to be a `string` but there are some newer models like
`meta-llama/Llama-Guard-3-1B` that expect the content to be parsed with the new OpenAI spec. In order to choose which
format the content needs to be parsed in by vLLM, please use the `--chat-template-text-format` argument to specify
between `string` or `openai`. The default value is `string` and vLLM internally converts both spec formats to match
this, unless explicitly specified.


## Command line arguments for the server

```{argparse}
Expand Down
1 change: 1 addition & 0 deletions tests/entrypoints/openai/test_serving_chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ class MockModelConfig:
tokenizer = MODEL_NAME
trust_remote_code = False
tokenizer_mode = "auto"
chat_template_text_format = "string"
max_model_len = 100
tokenizer_revision = None
multimodal_config = MultiModalConfig()
Expand Down
48 changes: 47 additions & 1 deletion tests/entrypoints/test_chat_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
MLLAMA_MODEL_ID = "meta-llama/Llama-3.2-11B-Vision-Instruct"


@pytest.fixture(scope="module")
@pytest.fixture(scope="function")
def phi3v_model_config():
return ModelConfig(PHI3V_MODEL_ID,
task="generate",
Expand All @@ -26,6 +26,7 @@ def phi3v_model_config():
trust_remote_code=True,
dtype="bfloat16",
seed=0,
chat_template_text_format="string",
limit_mm_per_prompt={
"image": 2,
})
Expand Down Expand Up @@ -330,6 +331,51 @@ def test_parse_chat_messages_multiple_images_across_messages(
_assert_mm_data_is_image_input(mm_data, 2)


def test_parse_chat_messages_context_text_format(
phi3v_model_config,
phi3v_tokenizer,
):
phi3v_model_config.chat_template_text_format = "openai"
conversation, mm_data = parse_chat_messages(
[{
"role": "user",
"content": [{
"type": "text",
"text": "What's in this text?"
}]
}, {
"role": "assistant",
"content": "Some stuff."
}, {
"role": "user",
"content": "What about this one?"
}], phi3v_model_config, phi3v_tokenizer)

assert conversation == [
{
"role": "user",
"content": [{
"type": "text",
"text": "What's in this text?"
}]
},
{
"role": "assistant",
"content": [{
"type": "text",
"text": "Some stuff."
}]
},
{
"role": "user",
"content": [{
"type": "text",
"text": "What about this one?"
}]
},
]


def test_parse_chat_messages_rejects_too_many_images_in_one_message(
phi3v_model_config,
phi3v_tokenizer,
Expand Down
2 changes: 2 additions & 0 deletions vllm/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ def __init__(self,
use_async_output_proc: bool = True,
override_neuron_config: Optional[Dict[str, Any]] = None,
config_format: ConfigFormat = ConfigFormat.AUTO,
chat_template_text_format: str = "string",
mm_processor_kwargs: Optional[Dict[str, Any]] = None) -> None:
self.model = model
self.tokenizer = tokenizer
Expand Down Expand Up @@ -176,6 +177,7 @@ def __init__(self,
self.model, revision)
self.dtype = _get_and_verify_dtype(self.hf_text_config, dtype)
self.use_async_output_proc = use_async_output_proc
self.chat_template_text_format = chat_template_text_format
self.mm_processor_kwargs = mm_processor_kwargs

# Set enforce_eager to False if the value is unset.
Expand Down
10 changes: 10 additions & 0 deletions vllm/engine/arg_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ class EngineArgs:
task: TaskOption = "auto"
skip_tokenizer_init: bool = False
tokenizer_mode: str = 'auto'
chat_template_text_format: str = 'string'
trust_remote_code: bool = False
download_dir: Optional[str] = None
load_format: str = 'auto'
Expand Down Expand Up @@ -250,6 +251,14 @@ def add_cli_args(parser: FlexibleArgumentParser) -> FlexibleArgumentParser:
'fast tokenizer if available.\n* "slow" will '
'always use the slow tokenizer. \n* '
'"mistral" will always use the `mistral_common` tokenizer.')
parser.add_argument(
'--chat-template-text-format',
type=str,
default=EngineArgs.chat_template_text_format,
choices=['string', 'openai'],
help='The format to render text content within a chat template. '
'"string" will keep the content field as a string whereas '
'"openai" will parse content in the current OpenAI format.')
parser.add_argument('--trust-remote-code',
action='store_true',
help='Trust remote code from huggingface.')
Expand Down Expand Up @@ -858,6 +867,7 @@ def create_model_config(self) -> ModelConfig:
# We know this is not None because we set it in __post_init__
tokenizer=cast(str, self.tokenizer),
tokenizer_mode=self.tokenizer_mode,
chat_template_text_format=self.chat_template_text_format,
trust_remote_code=self.trust_remote_code,
dtype=self.dtype,
seed=self.seed,
Expand Down
3 changes: 2 additions & 1 deletion vllm/engine/llm_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ def __init__(
"num_scheduler_steps=%d, chunked_prefill_enabled=%s "
"multi_step_stream_outputs=%s, enable_prefix_caching=%s, "
"use_async_output_proc=%s, use_cached_outputs=%s, "
"mm_processor_kwargs=%s)",
"chat_template_text_format=%s, mm_processor_kwargs=%s)",
VLLM_VERSION,
model_config.model,
speculative_config,
Expand Down Expand Up @@ -289,6 +289,7 @@ def __init__(
cache_config.enable_prefix_caching,
model_config.use_async_output_proc,
use_cached_outputs,
model_config.chat_template_text_format,
model_config.mm_processor_kwargs,
)
# TODO(woosuk): Print more configs in debug mode.
Expand Down
31 changes: 23 additions & 8 deletions vllm/entrypoints/chat_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ class ConversationMessage(TypedDict, total=False):
role: Required[str]
"""The role of the message's author."""

content: Optional[str]
content: Union[Optional[str], List[Dict[str, str]]]
"""The contents of the message"""

tool_call_id: Optional[str]
Expand Down Expand Up @@ -431,7 +431,7 @@ def _get_full_multimodal_text_prompt(placeholder_counts: Dict[str, int],
def _parse_chat_message_content_mm_part(
part: ChatCompletionContentPartParam) -> Tuple[str, str]:
"""
Parses a given multi modal content part based on its type.
Parses a given multi-modal content part based on its type.

Args:
part: A dict containing the content part, with a potential 'type' field.
Expand Down Expand Up @@ -485,21 +485,26 @@ def _parse_chat_message_content_parts(
role: str,
parts: Iterable[ChatCompletionContentPartParam],
mm_tracker: BaseMultiModalItemTracker,
chat_template_text_format: str,
) -> List[ConversationMessage]:
content: List[Union[str, Dict[str, str]]] = []

mm_parser = mm_tracker.create_parser()
keep_multimodal_content = \
wrap_dicts = \
mm_tracker._model_config.hf_config.model_type in \
MODEL_KEEP_MULTI_MODAL_CONTENT
MODEL_KEEP_MULTI_MODAL_CONTENT or \
(chat_template_text_format == "openai")

for part in parts:
parse_res = _parse_chat_message_content_part(
part, mm_parser, wrap_dicts=keep_multimodal_content)
part,
mm_parser,
wrap_dicts=wrap_dicts,
)
if parse_res:
content.append(parse_res)

if keep_multimodal_content:
if wrap_dicts:
# Parsing wraps images and texts as interleaved dictionaries
return [ConversationMessage(role=role,
content=content)] # type: ignore
Expand Down Expand Up @@ -560,6 +565,7 @@ def _parse_chat_message_content_part(
def _parse_chat_message_content(
message: ChatCompletionMessageParam,
mm_tracker: BaseMultiModalItemTracker,
chat_template_text_format: str,
) -> List[ConversationMessage]:
role = message["role"]
content = message.get("content")
Expand All @@ -575,6 +581,7 @@ def _parse_chat_message_content(
role,
content, # type: ignore
mm_tracker,
chat_template_text_format,
)

for result_msg in result:
Expand Down Expand Up @@ -618,7 +625,11 @@ def parse_chat_messages(
mm_tracker = MultiModalItemTracker(model_config, tokenizer)

for msg in messages:
sub_messages = _parse_chat_message_content(msg, mm_tracker)
sub_messages = _parse_chat_message_content(
msg,
mm_tracker,
model_config.chat_template_text_format,
)

conversation.extend(sub_messages)

Expand All @@ -636,7 +647,11 @@ def parse_chat_messages_futures(
mm_tracker = AsyncMultiModalItemTracker(model_config, tokenizer)

for msg in messages:
sub_messages = _parse_chat_message_content(msg, mm_tracker)
sub_messages = _parse_chat_message_content(
msg,
mm_tracker,
model_config.chat_template_text_format,
)

conversation.extend(sub_messages)

Expand Down
7 changes: 5 additions & 2 deletions vllm/entrypoints/openai/serving_chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -384,7 +384,7 @@ async def chat_completion_stream_generator(
# Send response to echo the input portion of the
# last message
if request.echo or request.continue_final_message:
last_msg_content: str = ""
last_msg_content: Union[str, List[Dict[str, str]]] = ""
if conversation and "content" in conversation[
-1] and conversation[-1].get("role") == role:
last_msg_content = conversation[-1]["content"] or ""
Expand Down Expand Up @@ -724,10 +724,13 @@ async def chat_completion_full_generator(
choices.append(choice_data)

if request.echo or request.continue_final_message:
last_msg_content = ""
last_msg_content: Union[str, List[Dict[str, str]]] = ""
if conversation and "content" in conversation[-1] and conversation[
-1].get("role") == role:
last_msg_content = conversation[-1]["content"] or ""
if isinstance(last_msg_content, list):
last_msg_content = "\n".join(msg['text']
for msg in last_msg_content)

for choice in choices:
full_message = last_msg_content + (choice.message.content
Expand Down