chat_templates

This is a repository that includes proper chat templates (or input formats) for large language models (LLMs), to support transformers's chat_template feature.

We know that different models are trained with different input formats, especially for those instruction-tuned or chat models. This is especially noted in transformers's new chat_template feature. However, I found that popular models (e.g., vicuna, falcon) on HuggingFace do not include this parameter in their tokenizer_config.json files, which may make it troublesome to properly run these models. Also, the chat_template feature requires to implement a Jinja template, which may be not intuitive to be directly done in the json files.

So I collect proper chat templates of several popular models from official reference or implementations, which are put under chat_templates. If you are interested to include more chat templates, feel free to open a pull request.

If you find this repo useful, please kindly cite it:

@misc{zheng-2024-chat-templates,
  author = {Zheng, Chujie},
  title = {Chat Templates for HuggingFace Large Language Models},
  year = {2024},
  howpublished = {\url{https://github.com/chujiezheng/chat_templates}}
}

Usage

You can enumerate the templates using importlib:

import importlib.resources
import logging
from importlib.abc import Traversable
from pathlib import Path


def known_chat_templates() -> dict:
  try:
    _chat_templates: Traversable
    with importlib.resources.files("huggingface_extra_chat_templates") / "chat_templates" as _chat_templates:
      return {Path(traversable.name).stem: traversable.read_text().replace('    ', '').replace('\n', '') for traversable
              in _chat_templates.iterdir() if traversable.is_file()}

  except ImportError as exc:
    logging.warning(
      f"Could not load extra chat templates, did you `pip install git+https://github.com/AppMana/appmana-comfyui-chat-templates.git` ?",
      exc_info=exc)
  return {}

known_chat_templates()

In order to find a match, use both the name_or_model and the original configuration dictionary's `_name_or

import logging

from transformers import AutoTokenizer, PretrainedConfig, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained(...)
tokenizer = AutoTokenizer.from_pretrained(...)
# Load from the config.json directly, because it will have the base model's name instead of the derived model's name,
# improving the chances of finding a match.
config_dict = PretrainedConfig.from_pretrained(...)
chat_template = tokenizer.chat_template if hasattr(tokenizer, "chat_template") else None
if chat_template is None:
  candidate_chat_templates = [(name, template) for name, template in known_chat_templates() if
                              name in config_dict["_name_or_path"] or name in model.name_or_path]
  if len(candidate_chat_templates) > 0:
    # todo: You decide which chat template may match in the event of multiple matches
    filename, chat_template = candidate_chat_templates[0]
    logging.debug(f"Selected chat template filename={filename} for {model.model.name_or_path}")

You are responsible for choosing which chat template to load for your model. Use this code for a heuristic approach; it is not possible to test all models that these templates are valid for.

Updates

[05/2024] Added support for Nvidia's ChatQA models
[04/2024] Added support for Microsoft's Phi-3 models
[04/2024] Added support for Meta's Llama-3 models
[02/2024] Added support for Google's Gemma models
[02/2024] Added usage explanation for generation_configs
[01/2024] Added support for Alibaba's Qwen2 models

What are Contained in This Repo?

chat_templates contains the jinja files of collected chat templates, which can be directly replaced in the Huggingface tokenizers.
generation_configs contains the corresponding json configs used for controlling the ending of response generations. Specially, the stop_token_ids should be directly passed into the generate method by the eos_token_id argument.

Supported Models

Model (Family)	Template File	Reference	Comment
`llama-3-instruct` New	`llama-3-instruct.jinja`	link	Official template `Meta-Llama-3-8B/70B-Instruct`
`qwen2-chat` New	`chatml.jinja`	link	ChatML format `Qwen1.5-0.4B/1.8B/4B/7B/14B/72B-Chat`
`mistral-instruct` New	`mistral-instruct.jinja`	link	`Mistral-7B-Instruct-v0.2/0.3` System message allowed
`phi-3` New	`phi-3.jinja`	link	Official template `Phi-3-mini-4k/128k-instruct`
`gemma-it` New	`gemma-it.jinja`	link	`gemma-2b/7b-it` System message allowed
`chatqa` New	`chatqa.jinja`	link	`Llama3-ChatQA-1.5-8B/70B` Context message allowed
`llama-2-chat`	`llama-2-chat.jinja`	link	Official template `Llama-2-7b/13b/70b-chat-hf`
`mistral-instruct-v0.1`	`mistral-instruct-v0.1.jinja`	link	`Mistral-7B-Instruct-v0.1` System message allowed
`openchat`	`openchat.jinja`	link	`openchat-3.5`
`zephyr`	`zephyr.jinja`	link	`zephyr-7b-alpha/beta`
`yi-chat`	`chatml.jinja`	link	ChatML format `Yi-6B/34B-Chat`
`orca-2`	`chatml.jinja`	link	ChatML format `Orca-2-7b/13b`
`vicuna`	`vicuna.jinja`	link	`vicuna-7b/13b-v1.5`
`falcon-instruct`	`falcon-instruct.jinja`	link	`falcon-7b/40b-instruct`
`starling-lm`	`openchat.jinja`	link	`Starling-LM-7B-alpha/beta`
`solar-instruct`	`solar-instruct.jinja`	link	`SOLAR-10.7B-Instruct-v1.0`
`alpaca`	`alpaca.jinja`	link	`alpaca`-style models, like `Platypus2-13B`
`amberchat`	`amberchat.jinja`	link	`AmberChat`, `AmberSafe`
`saiga`	`saiga.jinja`	link	`saiga`, a series of Russian models

Note: mistral-instruct-v0.1 is slightly different from mistral-instruct (for v0.2/0.3)

Examples of Setting `chat_template`

Important Note: As mentioned in this issue, the messages should contain at least one user message. It is strongly not recommented to pass only the system message, as there may result in unexpected outputs (because the models are not trained in this way).

Example 1: `llama-3-instruct`

This example may check if the jinja file is correctly implemented.

from transformers import AutoTokenizer

toker = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", token="YOUR_OWN_TOKEN")
messages = [
    {'role': 'system', 'content': 'This is a system prompt.'},
    {'role': 'user', 'content': 'This is the first user input.'},
    {'role': 'assistant', 'content': 'This is the first assistant response.'},
    {'role': 'user', 'content': 'This is the second user input.'},
]
print('###### Default (yet Correct) Chat Template ######')
print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))
print('###### Corrected Chat Template ######')
chat_template = open('./chat_templates/llama-3-instruct.jinja').read()
chat_template = chat_template.replace('    ', '').replace('\n', '')
toker.chat_template = chat_template
print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))

Expected output:

###### Default (yet Correct) Chat Template ######
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

This is a system prompt.<|eot_id|><|start_header_id|>user<|end_header_id|>

This is the first user input.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

This is the first assistant response.<|eot_id|><|start_header_id|>user<|end_header_id|>

This is the second user input.<|eot_id|><|start_header_id|>assistant<|end_header_id|>


###### Corrected Chat Template ######
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

This is a system prompt.<|eot_id|><|start_header_id|>user<|end_header_id|>

This is the first user input.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

This is the first assistant response.<|eot_id|><|start_header_id|>user<|end_header_id|>

This is the second user input.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Example 2: `llama-2-chat`

This example may check if the jinja file is correctly implemented.

from transformers import AutoTokenizer

toker = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf", token="YOUR_OWN_TOKEN")
messages = [
    {'role': 'system', 'content': 'This is a system prompt.'},
    {'role': 'user', 'content': 'This is the first user input.'},
    {'role': 'assistant', 'content': 'This is the first assistant response.'},
    {'role': 'user', 'content': 'This is the second user input.'},
]
print('###### Default (yet Correct) Chat Template ######')
print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))
print('###### Corrected Chat Template ######')
chat_template = open('./chat_templates/llama-2-chat.jinja').read()
chat_template = chat_template.replace('    ', '').replace('\n', '')
toker.chat_template = chat_template
print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))

Expected output:

###### Default (yet Correct) Chat Template ######
<s>[INST] <<SYS>>
This is a system prompt.
<</SYS>>

This is the first user input. [/INST] This is the first assistant response. </s><s>[INST] This is the second user input. [/INST]
###### Corrected Chat Template ######
<s>[INST] <<SYS>>
This is a system prompt.
<</SYS>>

This is the first user input. [/INST] This is the first assistant response. </s><s>[INST] This is the second user input. [/INST]

Example 3: `mistral-instruct`

For mistral-instruct (also gemma-it), it does not natively support the system message, so passing the system message would raise error.

from transformers import AutoTokenizer

toker = AutoTokenizer.from_pretrained("lmsys/vicuna-7b-v1.5")
messages = [
    {'role': 'system', 'content': 'This is a system prompt.'},
    {'role': 'user', 'content': 'This is the first user input.'},
    {'role': 'assistant', 'content': 'This is the first assistant response.'},
    {'role': 'user', 'content': 'This is the second user input.'},
]
print('###### Default (but Improper) Chat Template ######')
# raising error
#print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))
print('###### Corrected Chat Template ######')
chat_template = open('./chat_templates/mistral-instruct.jinja').read()
chat_template = chat_template.replace('    ', '').replace('\n', '')
toker.chat_template = chat_template
print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))

Expected output:

###### Default (but Error-Raising) Chat Template ######
jinja2.exceptions.TemplateError: Conversation roles must alternate user/assistant/user/assistant/...
###### Corrected Chat Template ######
<s>[INST] This is a system prompt.

This is the first user input. [/INST] This is the first assistant response. </s>[INST] This is the second user input. [/INST]

Example 4: `vicuna`

NOTE: In fast-chat, vicuna does not add linebreaks between roles' messages. But I found that adding linebreaks leads to a bit better performance (especially for the v1.5 version).

Also, I found vicuna-7/13/33b-v1.3 may not work well when given a system message different from its default one. So I would recommend to use vicuna-7/13b-v1.5 instead.

from transformers import AutoTokenizer

toker = AutoTokenizer.from_pretrained("lmsys/vicuna-7b-v1.5")
messages = [
    {'role': 'system', 'content': 'This is a system prompt.'},
    {'role': 'user', 'content': 'This is the first user input.'},
    {'role': 'assistant', 'content': 'This is the first assistant response.'},
    {'role': 'user', 'content': 'This is the second user input.'},
]
print('###### Default (but Improper) Chat Template ######')
print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))
print('###### Corrected Chat Template ######')
chat_template = open('./chat_templates/vicuna.jinja').read()
chat_template = chat_template.replace('    ', '').replace('\n', '')
toker.chat_template = chat_template
print(toker.apply_chat_template(messages, tokenize=False, add_generation_prompt=True))

Expected output:

###### Default (but Improper) Chat Template ######
<s>[INST] <<SYS>>
This is a system prompt.
<</SYS>>

This is the first user input. [/INST] This is the first assistant response. </s><s>[INST] This is the second user input. [/INST]
###### Corrected Chat Template ######
<s>This is a system prompt.

USER: This is the first user input.
ASSISTANT: This is the first assistant response.</s>
USER: This is the second user input.
ASSISTANT:

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
chat_templates		chat_templates
generation_configs		generation_configs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chat_templates

Usage

Updates

What are Contained in This Repo?

Supported Models

Examples of Setting `chat_template`

Example 1: `llama-3-instruct`

Example 2: `llama-2-chat`

Example 3: `mistral-instruct`

Example 4: `vicuna`

Star History

About

Releases

Packages

Languages

License

AppMana/appmana-comfyui-chat-templates

Folders and files

Latest commit

History

Repository files navigation

chat_templates

Usage

Updates

What are Contained in This Repo?

Supported Models

Examples of Setting chat_template

Example 1: llama-3-instruct

Example 2: llama-2-chat

Example 3: mistral-instruct

Example 4: vicuna

Star History

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Examples of Setting `chat_template`

Example 1: `llama-3-instruct`

Example 2: `llama-2-chat`

Example 3: `mistral-instruct`

Example 4: `vicuna`

Packages