Support passing json file to chat template (#1200)

* support json chat template * update cli * warning to info * update documents * lint * add md * mv * update * string also works fine * fix examples * refine doc * remove useless doc * refine * use default for doc * resolve comments * fix * add stop words in for better examples * resolve comments * refine examples * update doc * update CI * dialogue to chat
InternLM · Mar 13, 2024 · 9c3069f · 9c3069f
1 parent 8d8be52
commit 9c3069f
Show file tree

Hide file tree

Showing 16 changed files with 277 additions and 17 deletions.
diff --git a/docs/en/advance/chat_template.md b/docs/en/advance/chat_template.md
@@ -0,0 +1,90 @@
+# Customized chat template
+
+The effect of the applied chat template can be observed by **setting log level** `INFO`.
+
+LMDeploy supports two methods of adding chat templates:
+
+- One approach is to utilize an existing conversation template by directly configuring a JSON file like the following.
+
+  ```json
+  {
+      "model_name": "your awesome chat template name",
+      "system": "<|im_start|>system\n",
+      "meta_instruction": "You are a robot developed by LMDeploy.",
+      "eosys": "<|im_end|>\n",
+      "user": "<|im_start|>user\n",
+      "eoh": "<|im_end|>\n",
+      "assistant": "<|im_start|>assistant\n",
+      "eoa": "<|im_end|>",
+      "separator": "\n",
+      "capability": "chat",
+      "stop_words": ["<|im_end|>"]
+  }
+  ```
+
+  `model_name` is a required field and can be either the name of an LMDeploy built-in chat template (which can be viewed through `lmdeploy list`), or a new name. Other fields are optional.
+
+  1. When `model_name` is the name of a built-in chat template, the non-null fields in the JSON file will override the corresponding attributes of the original chat template.
+  2. However, when `model_name` is a new name, it will register `BaseChatTemplate` directly as a new chat template. The specific definition can be referred to [BaseChatTemplate](https://github.com/InternLM/lmdeploy/blob/24bd4b9ab6a15b3952e62bcfc72eaba03bce9dcb/lmdeploy/model.py#L113-L188).
+
+  The new chat template would be like this:
+
+  ```
+  {system}{meta_instruction}{eosys}{user}{user_content}{eoh}{assistant}{assistant_content}{eoa}{separator}{user}...
+  ```
+
+  When using the CLI tool, you can pass in a custom chat template with `--chat-template`, for example.
+
+  ```shell
+  lmdeploy serve api_server internlm/internlm2-chat-7b --chat-template ${JSON_FILE}
+  ```
+
+  You can also pass it in through the interface function, for example.
+
+  ```python
+  from lmdeploy import ChatTemplateConfig, serve
+  serve('internlm/internlm2-chat-7b',
+        chat_template_config=ChatTemplateConfig.from_json('${JSON_FILE}'))
+  ```
+
+- Another approach is to customize a Python chat template class like the existing LMDeploy chat templates. It can be used directly after successful registration. The advantages are a high degree of customization and strong controllability. Below is an example of registering an LMDeploy chat template.
+
+  ```python
+  from lmdeploy.model import MODELS, BaseChatTemplate
+
+
+  @MODELS.register_module(name='customized_model')
+  class CustomizedModel(BaseChatTemplate):
+      """A customized chat template."""
+
+      def __init__(self,
+                   system='<|im_start|>system\n',
+                   meta_instruction='You are a robot developed by LMDeploy.',
+                   user='<|im_start|>user\n',
+                   assistant='<|im_start|>assistant\n',
+                   eosys='<|im_end|>\n',
+                   eoh='<|im_end|>\n',
+                   eoa='<|im_end|>',
+                   separator='\n',
+                   stop_words=['<|im_end|>', '<|action_end|>']):
+          super().__init__(system=system,
+                           meta_instruction=meta_instruction,
+                           eosys=eosys,
+                           user=user,
+                           eoh=eoh,
+                           assistant=assistant,
+                           eoa=eoa,
+                           separator=separator,
+                           stop_words=stop_words)
+
+
+  from lmdeploy import ChatTemplateConfig, pipeline
+
+  messages = [{'role': 'user', 'content': 'who are you?'}]
+  pipe = pipeline('internlm/internlm2-chat-7b',
+                  chat_template_config=ChatTemplateConfig('customized_model'))
+  for response in pipe.stream_infer(messages):
+      print(response.text, end='')
+  ```
+
+  In this example, we register a LMDeploy chat template that sets the model to be created by LMDeploy, so when the user asks who the model is, the model will answer that it was created by LMDeploy.
diff --git a/docs/en/index.rst b/docs/en/index.rst
@@ -67,6 +67,7 @@ Welcome to LMDeploy's tutorials!
 
    advance/pytorch_new_model.md
    advance/long_context.md
+   advance/chat_template.md
    advance/debug_turbomind.md
    serving/qos.md
 

diff --git a/docs/en/inference/pipeline.md b/docs/en/inference/pipeline.md
@@ -154,3 +154,5 @@ print(response)
   ```
 
   Generally, in the context of multi-threading or multi-processing, it might be necessary to ensure that initialization code is executed only once. In this case, `if __name__ == '__main__':` can help to ensure that these initialization codes are run only in the main program, and not repeated in each newly created process or thread.
+
+- To customize a chat template, please refer to [chat_template.md](../advance/chat_template.md).
diff --git a/docs/en/serving/restful_api.md b/docs/en/serving/restful_api.md
@@ -229,6 +229,6 @@ Please refer to the [guidance](https://github.com/InternLM/OpenAOE/blob/main/doc
 
 4. The `/v1/chat/interactive` api disables engaging in multiple rounds of conversation by default. The input argument `prompt` consists of either single strings or entire chat histories.
 
-5. If you need to adjust other default parameters of the session, such as the content of fields like system. You can directly pass in the initialization parameters of the [dialogue template](https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/model.py). For example, for the internlm-chat-7b model, you can set the `--meta-instruction` parameter when starting the `api_server`.
+5. Regarding the stop words, we only support characters that encode into a single index. Furthermore, there may be multiple indexes that decode into results containing the stop word. In such cases, if the number of these indexes is too large, we will only use the index encoded by the tokenizer. If you want use a stop symbol that encodes into multiple indexes, you may consider performing string matching on the streaming client side. Once a successful match is found, you can then break out of the streaming loop.
 
-6. Regarding the stop words, we only support characters that encode into a single index. Furthermore, there may be multiple indexes that decode into results containing the stop word. In such cases, if the number of these indexes is too large, we will only use the index encoded by the tokenizer. If you want use a stop symbol that encodes into multiple indexes, you may consider performing string matching on the streaming client side. Once a successful match is found, you can then break out of the streaming loop.
+6. To customize a chat template, please refer to [chat_template.md](../advance/chat_template.md).
diff --git a/docs/zh_cn/advance/chat_template.md b/docs/zh_cn/advance/chat_template.md
@@ -0,0 +1,91 @@
+# 自定义对话模板
+
+被应用的对话模板效果，可以通过设置日志等级为`INFO`进行观测。
+
+LMDeploy 支持两种添加对话模板的形式：
+
+- 一种是利用现有对话模板，直接配置一个如下的 json 文件使用。
+
+  ```json
+  {
+      "model_name": "your awesome chat template name",
+      "system": "<|im_start|>system\n",
+      "meta_instruction": "You are a robot developed by LMDeploy.",
+      "eosys": "<|im_end|>\n",
+      "user": "<|im_start|>user\n",
+      "eoh": "<|im_end|>\n",
+      "assistant": "<|im_start|>assistant\n",
+      "eoa": "<|im_end|>",
+      "separator": "\n",
+      "capability": "chat",
+      "stop_words": ["<|im_end|>"]
+  }
+  ```
+
+  model_name 为必填项，可以是 LMDeploy 内置对话模板名（通过 `lmdeploy list` 可查阅），也可以是新名字。其他字段可选填。
+  当 model_name 是内置对话模板名时，json文件中各非 null 字段会覆盖原有对话模板的对应属性。
+  而当 model_name 是新名字时，它会把将`BaseChatTemplate`直接注册成新的对话模板。其具体定义可以参考[BaseChatTemplate](https://github.com/InternLM/lmdeploy/blob/24bd4b9ab6a15b3952e62bcfc72eaba03bce9dcb/lmdeploy/model.py#L113-L188)。
+
+  这样一个模板将会以下面的形式进行拼接。
+
+  ```
+  {system}{meta_instruction}{eosys}{user}{user_content}{eoh}{assistant}{assistant_content}{eoa}{separator}{user}...
+  ```
+
+  在使用 CLI 工具时，可以通过 `--chat-template` 传入自定义对话模板，比如：
+
+  ```shell
+  lmdeploy serve api_server internlm/internlm2-chat-7b --chat-template ${JSON_FILE}
+  ```
+
+  也可以在通过接口函数传入，比如：
+
+  ```python
+  from lmdeploy import ChatTemplateConfig, serve
+
+  serve('internlm/internlm2-chat-7b',
+        chat_template_config=ChatTemplateConfig.from_json('${JSON_FILE}'))
+  ```
+
+- 一种是以 LMDeploy 现有对话模板，自定义一个python对话模板类，注册成功后直接用即可。优点是自定义程度高，可控性强。
+  下面是一个注册 LMDeploy 对话模板的例子：
+
+  ```python
+  from lmdeploy.model import MODELS, BaseChatTemplate
+
+
+  @MODELS.register_module(name='customized_model')
+  class CustomizedModel(BaseChatTemplate):
+      """A customized chat template."""
+
+      def __init__(self,
+                   system='<|im_start|>system\n',
+                   meta_instruction='You are a robot developed by LMDeploy.',
+                   user='<|im_start|>user\n',
+                   assistant='<|im_start|>assistant\n',
+                   eosys='<|im_end|>\n',
+                   eoh='<|im_end|>\n',
+                   eoa='<|im_end|>',
+                   separator='\n',
+                   stop_words=['<|im_end|>', '<|action_end|>']):
+          super().__init__(system=system,
+                           meta_instruction=meta_instruction,
+                           eosys=eosys,
+                           user=user,
+                           eoh=eoh,
+                           assistant=assistant,
+                           eoa=eoa,
+                           separator=separator,
+                           stop_words=stop_words)
+
+
+  from lmdeploy import ChatTemplateConfig, pipeline
+
+  messages = [{'role': 'user', 'content': 'who are you?'}]
+  pipe = pipeline('internlm/internlm2-chat-7b',
+                  chat_template_config=ChatTemplateConfig('customized_model'))
+  for response in pipe.stream_infer(messages):
+      print(response.text, end='')
+  ```
+
+  在这个例子中，我们注册了一个 LMDeploy 的对话模板，该模板将模型设置为由 LMDeploy 创造，所以当用户提问模型是谁的时候，模型就会回答由 LMDeploy 所创。
diff --git a/docs/zh_cn/index.rst b/docs/zh_cn/index.rst
@@ -69,6 +69,7 @@
 
    advance/pytorch_new_model.md
    advance/long_context.md
+   advance/chat_template.md
    advance/debug_turbomind.md
    serving/qos.md
 

diff --git a/docs/zh_cn/inference/pipeline.md b/docs/zh_cn/inference/pipeline.md
@@ -156,3 +156,5 @@ print(response)
   ```
 
   一般来说，在多线程或多进程上下文中，可能需要确保初始化代码只执行一次。这时候，`if __name__ == '__main__':` 可以帮助确保这些初始化代码只在主程序执行，而不会在每个新创建的进程或线程中重复执行。
+
+- 自定义对话模板，请参考[chat_template.md](../advance/chat_template.md)
diff --git a/docs/zh_cn/serving/restful_api.md b/docs/zh_cn/serving/restful_api.md
@@ -226,6 +226,6 @@ openaoe -f /path/to/your/config-template.yaml
 
 4. `/v1/chat/interactive` api 支持多轮对话, 但是默认关闭。`messages` 或者 `prompt` 参数既可以是一个简单字符串表示用户的单词提问，也可以是一段对话历史。
 
-5. 如需调整会话默认的其他参数，比如 system 等字段的内容，可以直接将[对话模板](https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/model.py)初始化参数传入。比如 internlm-chat-7b 模型，可以通过启动`api_server`时，设置`--meta-instruction`参数。
+5. 关于停止符，我们只支持编码后为单个 index 的字符。此外，可能存在多种 index 都会解码出带有停止符的结果。对于这种情况，如果这些 index 数量太多，我们只会采用 tokenizer 编码出的 index。而如果你想要编码后为多个 index 的停止符，可以考虑在流式客户端做字符串匹配，匹配成功后跳出流式循环即可。
 
-6. 关于停止符，我们只支持编码后为单个 index 的字符。此外，可能存在多种 index 都会解码出带有停止符的结果。对于这种情况，如果这些 index 数量太多，我们只会采用 tokenizer 编码出的 index。而如果你想要编码后为多个 index 的停止符，可以考虑在流式客户端做字符串匹配，匹配成功后跳出流式循环即可。
+6. 自定义对话模板，请参考[chat_template.md](../advance/chat_template.md)
diff --git a/examples/vl/qwen_model.py b/examples/vl/qwen_model.py
@@ -50,7 +50,7 @@ def messages2prompt(self, messages, sequence_start=True):
                        assistant=self.assistant,
                        system=self.system)
         eox_map = dict(user=self.eoh,
-                       assistant=self.eoa + self.stop_word_suffix,
+                       assistant=self.eoa + self.separator,
                        system=self.eosys)
         ret = ''
         if self.meta_instruction is not None:

diff --git a/examples/vl/xcomposer_model.py b/examples/vl/xcomposer_model.py
@@ -64,7 +64,7 @@ def messages2prompt(self, messages, sequence_start=True):
                        assistant=self.assistant,
                        system=self.system)
         eox_map = dict(user=self.eoh,
-                       assistant=self.eoa + self.stop_word_suffix,
+                       assistant=self.eoa + self.separator,
                        system=self.eosys)
         ret = ''
         if self.meta_instruction is not None:

diff --git a/lmdeploy/cli/chat.py b/lmdeploy/cli/chat.py
@@ -66,7 +66,8 @@ def add_parser_turbomind():
         ArgumentHelper.session_len(engine_group)
         # other arguments
         ArgumentHelper.cap(parser)
-        ArgumentHelper.meta_instruction(parser)
+        ArgumentHelper.meta_instruction(parser)  # TODO remove
+        ArgumentHelper.chat_template(parser)
 
     @staticmethod
     def torch(args):
@@ -89,6 +90,16 @@ def turbomind(args):
         """Chat with TurboMind inference engine through terminal."""
         from lmdeploy.turbomind.chat import main
         kwargs = convert_args(args)
+        from lmdeploy.model import ChatTemplateConfig
+        chat_template_config = ChatTemplateConfig(
+            model_name=args.model_name,
+            meta_instruction=args.meta_instruction,
+            capability=args.cap)
+        if args.chat_template:
+            chat_template_config = ChatTemplateConfig.from_json(
+                args.chat_template)
+        kwargs.update(dict(chat_template_cfg=chat_template_config))
+        kwargs.pop('chat_template', None)
         main(**kwargs)
 
     @staticmethod

diff --git a/lmdeploy/cli/serve.py b/lmdeploy/cli/serve.py
@@ -43,8 +43,8 @@ def add_parser_gradio():
         ArgumentHelper.backend(parser)
 
         # chat template args
-        ArgumentHelper.meta_instruction(parser)
-        ArgumentHelper.cap(parser)
+        ArgumentHelper.meta_instruction(parser)  # TODO remove
+        ArgumentHelper.chat_template(parser)
 
         # pytorch engine args
         pt_group = parser.add_argument_group('PyTorch engine arguments')
@@ -126,7 +126,8 @@ def add_parser_api_server():
         ArgumentHelper.ssl(parser)
 
         # chat template args
-        ArgumentHelper.meta_instruction(parser)
+        ArgumentHelper.meta_instruction(parser)  # TODO remove
+        ArgumentHelper.chat_template(parser)
         ArgumentHelper.cap(parser)
 
         # pytorch engine args
@@ -220,6 +221,9 @@ def gradio(args):
             model_name=args.model_name,
             meta_instruction=args.meta_instruction,
             capability=args.cap)
+        if args.chat_template:
+            chat_template_config = ChatTemplateConfig.from_json(
+                args.chat_template)
         run(args.model_path_or_server,
             server_name=args.server_name,
             server_port=args.server_port,
@@ -261,6 +265,9 @@ def api_server(args):
             model_name=args.model_name,
             meta_instruction=args.meta_instruction,
             capability=args.cap)
+        if args.chat_template:
+            chat_template_config = ChatTemplateConfig.from_json(
+                args.chat_template)
         run_api_server(args.model_path,
                        backend=backend,
                        backend_config=backend_config,

diff --git a/lmdeploy/cli/utils.py b/lmdeploy/cli/utils.py
@@ -210,8 +210,8 @@ def cap(parser):
             type=str,
             default='chat',
             choices=['completion', 'infilling', 'chat', 'python'],
-            help='The capability of a model. For example, codellama has the '
-            'ability among ["completion", "infilling", "chat", "python"]')
+            help='The capability of a model. '
+            'Deprecated. Please use --chat-template instead')
 
     @staticmethod
     def log_level(parser):
@@ -316,10 +316,25 @@ def device(parser):
     def meta_instruction(parser):
         """Add argument meta_instruction to parser."""
 
-        return parser.add_argument('--meta-instruction',
-                                   type=str,
-                                   default=None,
-                                   help='System prompt for ChatTemplateConfig')
+        return parser.add_argument(
+            '--meta-instruction',
+            type=str,
+            default=None,
+            help='System prompt for ChatTemplateConfig. Deprecated. '
+            'Please use --chat-template instead')
+
+    @staticmethod
+    def chat_template(parser):
+        """Add chat template config to parser."""
+
+        return parser.add_argument(
+            '--chat-template',
+            type=str,
+            default=None,
+            help=\
+            'A JSON file or string that specifies the chat template configuration. '  # noqa
+            'Please refer to https://lmdeploy.readthedocs.io/en/latest/advance/chat_template.html for the specification'  # noqa
+        )
 
     @staticmethod
     def cache_max_entry_count(parser):
Original file line number	Diff line number	Diff line change
Expand Up		@@ -154,3 +154,5 @@ print(response)
		```

		Generally, in the context of multi-threading or multi-processing, it might be necessary to ensure that initialization code is executed only once. In this case, `if __name__ == '__main__':` can help to ensure that these initialization codes are run only in the main program, and not repeated in each newly created process or thread.

		- To customize a chat template, please refer to [chat_template.md](../advance/chat_template.md).
Original file line number	Diff line number	Diff line change
Expand Up		@@ -156,3 +156,5 @@ print(response)
		```

		一般来说，在多线程或多进程上下文中，可能需要确保初始化代码只执行一次。这时候，`if __name__ == '__main__':` 可以帮助确保这些初始化代码只在主程序执行，而不会在每个新创建的进程或线程中重复执行。

		- 自定义对话模板，请参考[chat_template.md](../advance/chat_template.md)