feat(huixiangdou): add chat_with_repo pipeline (#362)

* feat(service): add parallel pipeline * feat(service): gradio streaming chat * style(llm_client.py): remove useless
InternLM · Aug 20, 2024 · 87a10e1 · 87a10e1
1 parent 3b81797
commit 87a10e1
Show file tree

Hide file tree

Showing 31 changed files with 773 additions and 258 deletions.
diff --git a/.github/scripts/doc_link_checker.py b/.github/scripts/doc_link_checker.py
@@ -58,6 +58,7 @@ def analyze_doc(home, path):
                         ref = ref[ref.find('#'):]
                     fullpath = os.path.join(home, ref)
                     if not os.path.exists(fullpath):
+                        raise ValueError(fullpath)
                         problem_list.append(ref)
             else:
                 continue

diff --git a/README.md b/README.md
@@ -30,12 +30,14 @@ English | [简体中文](README_zh.md)
 
 </div>
 
-HuixiangDou is a **group chat** assistant based on LLM (Large Language Model).
+HuixiangDou is a **professional knowledge assistant** based on LLM.
 
 Advantages:
 
-1. Design a three-stage pipeline of preprocess, rejection and response to cope with group chat scenario, answer user questions without message flooding, see [2401.08772](https://arxiv.org/abs/2401.08772), [2405.02817](https://arxiv.org/abs/2405.02817), [Hybrid Retrieval](./docs/knowledge_graph_en.md) and [Precision Report](./evaluation/).
-2. No training required, with CPU-only, 2G, 10G and 80G configuration
+1. Design three-stage pipelines of preprocess, rejection and response
+  * `chat_in_group` copes with **group chat** scenario, answer user questions without message flooding, see [2401.08772](https://arxiv.org/abs/2401.08772), [2405.02817](https://arxiv.org/abs/2405.02817), [Hybrid Retrieval](./docs/knowledge_graph_en.md) and [Precision Report](./evaluation/)
+  * `chat_with_repo` for **real-time streaming** chat
+2. No training required, with CPU-only, 2G, 10G, 20G and 80G configuration
 3. Offers a complete suite of Web, Android, and pipeline source code, industrial-grade and commercially viable
 
 Check out the [scenes in which HuixiangDou are running](./huixiangdou-inside.md) and join [WeChat Group](resource/figures/wechat.jpg) to try AI assistant inside.
@@ -46,6 +48,7 @@ If this helps you, please give it a star ⭐
 
 Our Web version has been released to [OpenXLab](https://openxlab.org.cn/apps/detail/tpoisonooo/huixiangdou-web), where you can create knowledge base, update positive and negative examples, turn on web search, test chat, and integrate into Feishu/WeChat groups. See [BiliBili](https://www.bilibili.com/video/BV1S2421N7mn) and [YouTube](https://www.youtube.com/watch?v=ylXrT-Tei-Y) !
 
+- \[2024/08\] `chat_with_repo` [pipeline](./huixiangdou/service/parallel_pipeline.py) 👍
 - \[2024/07\] Image and text retrieval & Removal of `langchain` 👍
 - \[2024/07\] [Hybrid Knowledge Graph and Dense Retrieval](./docs/knowledge_graph_en.md) improve 1.7% F1 score 🎯
 - \[2024/06\] [Evaluation of chunksize, splitter, and text2vec model](./evaluation) 🎯
@@ -221,7 +224,9 @@ python3 -m huixiangdou.main --standalone
 python3 -m huixiangdou.gradio
 ```
 
-Or run a server to listen 23333:
+https://github.com/user-attachments/assets/9e5dbb30-1dc1-42ad-a7d4-dc7380676554
+
+Or run a server to listen 23333, default pipeline is `chat_with_repo`:
 ```bash
 python3 -m huixiangdou.server
 
@@ -368,7 +373,7 @@ Contributors have provided [Android tools](./android) to interact with WeChat. T
 3. How to access other local LLM / After access, the effect is not ideal?
 
    - Open [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py), add a new LLM inference implementation.
-   - Refer to [test_intention_prompt and test data](./tests/test_intention_prompt.py), adjust prompt and threshold for the new model, and update them into [worker.py](./huixiangdou/service/worker.py).
+   - Refer to [test_intention_prompt and test data](./tests/test_intention_prompt.py), adjust prompt and threshold for the new model, and update them into [prompt.py](./huixiangdou/service/prompt.py).
 
 4. What if the response is too slow/request always fails?
 

diff --git a/README_zh.md b/README_zh.md
@@ -29,10 +29,12 @@
 
 </div>
 
-茴香豆是一个基于 LLM 的**群聊**知识助手，优势：
+茴香豆是一个基于 LLM 的专业知识助手，优势：
 
-1. 设计预处理、拒答、响应三阶段 pipeline 应对群聊场景，解答问题同时不会消息泛滥。精髓见 [2401.08772](https://arxiv.org/abs/2401.08772)，[2405.02817](https://arxiv.org/abs/2405.02817)，[混合检索](./docs/knowledge_graph_zh.md)和[业务数据精度测试](./evaluation)
-2. 无需训练适用各行业，提供 CPU-only、2G、10G、80G 规格配置
+1. 设计预处理、拒答、响应三阶段 pipeline：
+    * `chat_in_group` 群聊场景，解答问题时不会消息泛滥。见 [2401.08772](https://arxiv.org/abs/2401.08772)，[2405.02817](https://arxiv.org/abs/2405.02817)，[混合检索](./docs/knowledge_graph_zh.md)和[业务数据精度测试](./evaluation)
+    * `chat_with_repo` 实时聊天场景，响应更快
+2. 无需训练适用各行业，提供 CPU-only、2G、10G、20G、80G 规格配置
 3. 提供一整套前后端 web、android、算法源码，工业级开源可商用
 
 查看[茴香豆已运行在哪些场景](./huixiangdou-inside.md)；加入[微信群](resource/figures/wechat.jpg)直接体验群聊助手效果。
@@ -45,6 +47,7 @@
 
 Web 版视频教程见 [BiliBili](https://www.bilibili.com/video/BV1S2421N7mn) 和 [YouTube](https://www.youtube.com/watch?v=ylXrT-Tei-Y)。
 
+- \[2024/08\] `chat_with_repo` [pipeline](./huixiangdou/service/parallel_pipeline.py) 
 - \[2024/07\] 图文检索 & 移除 `langchain` 👍
 - \[2024/07\] [混合知识图谱和稠密检索，F1 提升 1.7%](./docs/knowledge_graph_zh.md) 🎯
 - \[2024/06\] [评估 chunksize，splitter 和 text2vec 模型](./evaluation) 🎯
@@ -216,10 +219,14 @@ python3 -m huixiangdou.main --standalone
 💡 也可以启动 `gradio` 搭建一个简易的 Web UI，默认绑定 7860 端口：
 
 ```bash
-python3 -m huixiangdou.gradio
+python3 -m huixiangdou.gradio 
+# 若已单独运行 `llm_server_hybrid.py`，可以 
+# python3 -m huixiangdou.gradio --no-standalone
 ```
 
-或者启动服务端，监听 23333 端口：
+https://github.com/user-attachments/assets/9e5dbb30-1dc1-42ad-a7d4-dc7380676554
+
+或者启动服务端，监听 23333 端口。默认使用 `chat_with_repo` pipeline：
 ```bash
 python3 -m huixiangdou.server
 
@@ -364,7 +371,7 @@ python3 tests/test_query_gradio.py
 3. 如何接入其他 local LLM / 接入后效果不理想怎么办？
 
    - 打开 [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py)，增加新的 LLM 推理实现
-   - 参照 [test_intention_prompt 和测试数据](./tests/test_intention_prompt.py)，针对新模型调整 prompt 和阈值，更新到 [worker.py](./huixiangdou/service/worker.py)
+   - 参照 [test_intention_prompt 和测试数据](./tests/test_intention_prompt.py)，针对新模型调整 prompt 和阈值，更新到 [prompt.py](./huixiangdou/service/prompt.py)
 
 4. 响应太慢/网络请求总是失败怎么办？
 

diff --git a/config.ini b/config.ini
@@ -25,7 +25,7 @@ engine = "serper"
 # For ddgs, see https://pypi.org/project/duckduckgo-search
 # For serper, check https://serper.dev/api-key to get a free API key
 serper_x_api_key = "YOUR-API-KEY-HERE"
-domain_partial_order = ["openai.com", "pytorch.org", "readthedocs.io", "nvidia.com", "stackoverflow.com", "juejin.cn", "zhuanlan.zhihu.com", "www.cnblogs.com"]
+domain_partial_order = ["arxiv.org", "openai.com", "pytorch.org", "readthedocs.io", "nvidia.com", "stackoverflow.com", "juejin.cn", "zhuanlan.zhihu.com", "www.cnblogs.com"]
 save_dir = "logs/web_search_result"
 
 [llm]

diff --git a/docs/full_dev_en.md b/docs/full_dev_en.md
@@ -74,6 +74,6 @@ The basic version may not perform well. You can enable these features to enhance
 
    It is often unavoidable to adjust parameters with respect to business scenarios.
 
-   - Refer to [data.json](./tests/data.json) to add real data, run [test_intention_prompt.py](./tests/test_intention_prompt.py) to get suitable prompts and thresholds, and update them into [worker](./huixiangdou/service/worker.py).
-   - Adjust the [number of search results](./huixiangdou/service/worker.py) based on the maximum length supported by the model.
+   - Refer to [data.json](../tests/data.json) to add real data, run [test_intention_prompt.py](../tests/test_intention_prompt.py) to get suitable prompts and thresholds, and update them into [prompt.py](../huixiangdou/service/prompt.py).
+   - Adjust the [number of search results](../huixiangdou/service/serial_pipeline.py) based on the maximum length supported by the model.
    - Update `web_search.domain_partial_order` in `config.ini` according to your scenarios.
diff --git a/docs/full_dev_zh.md b/docs/full_dev_zh.md
@@ -73,6 +73,6 @@
 
    针对业务场景调参往往不可避免。
 
-   - 参照 [data.json](./tests/data.json) 增加真实数据，运行 [test_intention_prompt.py](./tests/test_intention_prompt.py) 得到合适的 prompt 和阈值，更新进 [worker](./huixiangdou/service/worker.py)
-   - 根据模型支持的最大长度，调整[搜索结果个数](./huixiangdou/service/worker.py)
+   - 参照 [data.json](../tests/data.json) 增加真实数据，运行 [test_intention_prompt.py](../tests/test_intention_prompt.py) 得到合适的 prompt 和阈值，更新进 [prompt.py](../huixiangdou/service/prompt.py)
+   - 根据模型支持的最大长度，调整[搜索结果个数](../huixiangdou/service/serial_pipeline.py)
    - 按照场景偏好，修改 config.ini 中的 `web_search.domain_partial_order`，即搜索结果偏序
diff --git a/huixiangdou/__init__.py b/huixiangdou/__init__.py
@@ -6,7 +6,7 @@
 from .service import FeatureStore  # noqa E401
 from .service import HybridLLMServer  # noqa E401
 from .service import WebSearch  # noqa E401
-from .service import Worker  # noqa E401
+from .service import SerialPipeline, ParallelPipeline # no E401
 from .service import build_reply_text  # noqa E401
 from .service import llm_serve  # noqa E401
 from .version import __version__
diff --git a/huixiangdou/frontend/wechat.py b/huixiangdou/frontend/wechat.py
@@ -845,7 +845,7 @@ def loop(self, worker):
 
 def parse_args():
     """Parse args."""
-    parser = argparse.ArgumentParser(description='Worker.')
+    parser = argparse.ArgumentParser(description='wechat server.')
     parser.add_argument('--work_dir',
                         type=str,
                         default='workdir',

diff --git a/huixiangdou/gradio.py b/huixiangdou/gradio.py
@@ -4,19 +4,19 @@
 import time
 import pdb
 from multiprocessing import Process, Value
-
+import asyncio
 import cv2
 import gradio as gr
 import pytoml
 from loguru import logger
-
+from typing import List
 from huixiangdou.primitive import Query
-from huixiangdou.service import ErrorCode, Worker, llm_serve, start_llm_server
-
+from huixiangdou.service import ErrorCode, SerialPipeline, ParallelPipeline, llm_serve, start_llm_server
+import json
 
 def parse_args():
     """Parse args."""
-    parser = argparse.ArgumentParser(description='Worker.')
+    parser = argparse.ArgumentParser(description='SerialPipeline.')
     parser.add_argument('--work_dir',
                         type=str,
                         default='workdir',
@@ -25,7 +25,7 @@ def parse_args():
         '--config_path',
         default='config.ini',
         type=str,
-        help='Worker configuration path. Default value is config.ini')
+        help='SerialPipeline configuration path. Default value is config.ini')
     parser.add_argument('--standalone',
                         action='store_true',
                         default=True,
@@ -37,50 +37,142 @@ def parse_args():
     args = parser.parse_args()
     return args
 
-def predict(text, image):
+language='en'
+enable_web_search=False
+pipeline='chat_with_repo'
+main_args = None
+paralle_assistant = None
+serial_assistant = None
+
+def on_language_changed(value:str):
+    global language
+    print(value)
+    language = value
+
+def on_pipeline_changed(value:str):
+    global pipeline
+    print(value)
+    pipeline = value
+
+def on_web_search_changed(value: str):
+    global enable_web_search
+    print(value)
+    if 'no' in value:
+        enable_web_search = False
+    else:
+        enable_web_search = True
+
+
+def format_refs(refs: List[str]):
+    refs_filter = list(set(refs))
+    if len(refs) < 1:
+        return ''
+    text = ''
+    if language == 'zh':
+        text += '参考资料：\r\n'
+    else:
+        text += '**References:**\r\n'
+
+    for file_or_url in refs_filter:
+        text += '* {}\r\n'.format(file_or_url)
+    text += '\r\n'
+    return text
+
+
+async def predict(text:str, image:str):
+    global language
+    global enable_web_search
+    global pipeline
+    global main_args
+    global serial_assistant
+    global paralle_assistant
+
+    with open('query.txt', 'a') as f:
+        f.write(json.dumps({'data': text}))
+        f.write('\n')
+
     if image is not None:
         filename = 'image.png'
         image_path = os.path.join(args.work_dir, filename)
         cv2.imwrite(image_path, image)
     else:
         image_path = None
 
-    assistant = Worker(work_dir=args.work_dir, config_path=args.config_path)
     query = Query(text, image_path)
+    if 'chat_in_group' in pipeline:
+        if serial_assistant is None:
+            serial_assistant = SerialPipeline(work_dir=main_args.work_dir, config_path=main_args.config_path)
+        args = {'query':query, 'history': [], 'groupname':''}
+        pipeline = {'status': {}}
+        debug = dict()
+        stream_chat_content = ''
+        for sess in serial_assistant.generate(**args):
+            if len(sess.delta) > 0:
+                # start chat, display
+                stream_chat_content += sess.delta
+                yield stream_chat_content
+            else:
+                status = {
+                    "state":str(sess.code),
+                    "response": sess.response,
+                    "refs": sess.references
+                }
+                pipeline['status'] = status
+                pipeline['debug'] = sess.debug
+
+                json_str = json.dumps(pipeline, indent=2, ensure_ascii=False)
+                yield json_str
 
-    pipeline = {'step': []}
-    debug = dict()
-    for sess in assistant.generate(query=query, history=[], groupname=''):
-        status = {
-            "state":str(sess.code),
-            "response": sess.response,
-            "refs": sess.references
-        }
+    else:
+        if paralle_assistant is None:
+            paralle_assistant = ParallelPipeline(work_dir=main_args.work_dir, config_path=main_args.config_path)
+        args = {'query':query, 'history':[], 'language':language}
+        args['enable_web_search'] = enable_web_search
 
-        print(status)
-        pipeline['step'].append(status)
-        pipeline['debug'] = sess.debug
+        sentence = ''
+        async for sess in paralle_assistant.generate(**args):
+            if sentence == '' and len(sess.references) > 0:
+                sentence = format_refs(sess.references)
 
-        json_str = json.dumps(pipeline, indent=2, ensure_ascii=False)
-        yield json_str
+            if len(sess.delta) > 0:
+                sentence += sess.delta
+                yield sentence
+
+        yield sentence
 
 if __name__ == '__main__':
-    args = parse_args()
+    main_args = parse_args()
 
     # start service
-    if args.standalone is True:
+    if main_args.standalone is True:
         # hybrid llm serve
-        start_llm_server(config_path=args.config_path)
+        start_llm_server(config_path=main_args.config_path)
 
-    with gr.Blocks() as demo:
+    with gr.Blocks(theme=gr.themes.Soft(), title='HuixiangDou AI assistant', analytics_enabled=True) as demo:
+        with gr.Row():
+            gr.Markdown("""
+            #### [HuixiangDou](https://github.com/internlm/huixiangdou) AI assistant
+            """, label='Reply', header_links=True, line_breaks=True,)
         with gr.Row():
-            input_question = gr.TextArea(label='Input the question.')
-            input_image = gr.Image(label='Upload Image.')
+            with gr.Column():
+                ui_pipeline = gr.Radio(["chat_with_repo", "chat_in_group"], label="Pipeline type", info="Group-chat is slow but accurate and safe, default value is `chat_with_repo`")
+                ui_pipeline.change(fn=on_pipeline_changed, inputs=ui_pipeline, outputs=[])
+            with gr.Column():
+                ui_language = gr.Radio(["en", "zh"], label="Language", info="Use `en` by default                                 ")
+                ui_language.change(fn=on_language_changed, inputs=ui_language, outputs=[])
+            with gr.Column():
+                ui_web_search = gr.Radio(["no", "yes"], label="Enable web search", info="Disable by default                                 ")
+                ui_web_search.change(on_web_search_changed, inputs=ui_web_search, outputs=[])
+
+        with gr.Row():
+            input_question = gr.TextArea(label='Input your question', placeholder='how to install mmpose ?', show_copy_button=True, lines=9)
+            input_image = gr.Image(label='[Optional] Image-text retrieval needs `config-multimodal.ini`')
         with gr.Row():
             run_button = gr.Button()
         with gr.Row():
-            result = gr.TextArea(label='HuixiangDou pipline status', show_copy_button=True)
+            result = gr.Markdown('>Text reply or inner status callback here, depends on `pipeline type`', label='Reply', show_label=True, header_links=True, line_breaks=True, show_copy_button=True)
+            # result = gr.TextArea(label='Reply', show_copy_button=True, placeholder='Text Reply or inner status callback, depends on `pipeline type`')
+
         run_button.click(predict, [input_question, input_image], [result])
-
     demo.queue()
     demo.launch(share=False, server_name='0.0.0.0', debug=True)
diff --git a/huixiangdou/main.py b/huixiangdou/main.py
@@ -11,12 +11,12 @@
 from loguru import logger
 from termcolor import colored
 
-from .service import ErrorCode, Worker, build_reply_text, start_llm_server
+from .service import ErrorCode, SerialPipeline, build_reply_text, start_llm_server
 
 
 def parse_args():
     """Parse args."""
-    parser = argparse.ArgumentParser(description='Worker.')
+    parser = argparse.ArgumentParser(description='SerialPipeline.')
     parser.add_argument('--work_dir',
                         type=str,
                         default='workdir',
@@ -25,7 +25,7 @@ def parse_args():
         '--config_path',
         default='config.ini',
         type=str,
-        help='Worker configuration path. Default value is config.ini')
+        help='SerialPipeline configuration path. Default value is config.ini')
     parser.add_argument('--standalone',
                         action='store_true',
                         default=False,
@@ -191,7 +191,7 @@ def run():
     with open(args.config_path, encoding='utf8') as f:
         fe_config = pytoml.load(f)['frontend']
     logger.info('Config loaded.')
-    assistant = Worker(work_dir=args.work_dir, config_path=args.config_path)
+    assistant = SerialPipeline(work_dir=args.work_dir, config_path=args.config_path)
 
     fe_type = fe_config['type']
     if fe_type == 'none':
@@ -209,8 +209,5 @@ def run():
             f'unsupported fe_config.type {fe_type}, please read `config.ini` description.'  # noqa E501
         )
 
-    # server_process.join()
-
-
 if __name__ == '__main__':
     run()