Skip to content

Commit

Permalink
[Frontend] Automatic detection of chat content format from AST (vllm-…
Browse files Browse the repository at this point in the history
…project#9919)

Signed-off-by: DarkLight1337 <[email protected]>
  • Loading branch information
DarkLight1337 authored Nov 16, 2024
1 parent efa9852 commit 6adb486
Show file tree
Hide file tree
Showing 16 changed files with 788 additions and 350 deletions.
18 changes: 13 additions & 5 deletions docs/source/serving/openai_compatible_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,12 +172,20 @@ completion = client.chat.completions.create(
]
)
```
Most chat templates for LLMs expect the `content` to be a `string` but there are some newer models like
`meta-llama/Llama-Guard-3-1B` that expect the content to be parsed with the new OpenAI spec. In order to choose which
format the content needs to be parsed in by vLLM, please use the `--chat-template-text-format` argument to specify
between `string` or `openai`. The default value is `string` and vLLM internally converts both spec formats to match
this, unless explicitly specified.

Most chat templates for LLMs expect the `content` field to be a string, but there are some newer models like
`meta-llama/Llama-Guard-3-1B` that expect the content to be formatted according to the OpenAI schema in the
request. vLLM provides best-effort support to detect this automatically, which is logged as a string like
*"Detected the chat template content format to be..."*, and internally converts incoming requests to match
the detected format, which can be one of:

- `"string"`: A string.
- Example: `"Hello world"`
- `"openai"`: A list of dictionaries, similar to OpenAI schema.
- Example: `[{"type": "text", "text": "Hello world!"}]`

If the result is not what you expect, you can set the `--chat-template-content-format` CLI argument
to override which format to use.

## Command line arguments for the server

Expand Down
3 changes: 2 additions & 1 deletion tests/entrypoints/openai/test_serving_chat.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ class MockModelConfig:
tokenizer = MODEL_NAME
trust_remote_code = False
tokenizer_mode = "auto"
chat_template_text_format = "string"
max_model_len = 100
tokenizer_revision = None
multimodal_config = MultiModalConfig()
Expand All @@ -49,6 +48,7 @@ async def _async_serving_chat_init():
BASE_MODEL_PATHS,
response_role="assistant",
chat_template=CHAT_TEMPLATE,
chat_template_content_format="auto",
lora_modules=None,
prompt_adapters=None,
request_logger=None)
Expand All @@ -70,6 +70,7 @@ def test_serving_chat_should_set_correct_max_tokens():
BASE_MODEL_PATHS,
response_role="assistant",
chat_template=CHAT_TEMPLATE,
chat_template_content_format="auto",
lora_modules=None,
prompt_adapters=None,
request_logger=None)
Expand Down
Loading

0 comments on commit 6adb486

Please sign in to comment.