Skip to content

Commit

Permalink
feat(huixiangdou): add chat_with_repo pipeline (#362)
Browse files Browse the repository at this point in the history
* feat(service): add parallel pipeline

* feat(service): gradio streaming chat

* style(llm_client.py): remove useless
  • Loading branch information
tpoisonooo authored Aug 20, 2024
1 parent 3b81797 commit 87a10e1
Show file tree
Hide file tree
Showing 31 changed files with 773 additions and 258 deletions.
1 change: 1 addition & 0 deletions .github/scripts/doc_link_checker.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ def analyze_doc(home, path):
ref = ref[ref.find('#'):]
fullpath = os.path.join(home, ref)
if not os.path.exists(fullpath):
raise ValueError(fullpath)
problem_list.append(ref)
else:
continue
Expand Down
15 changes: 10 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,14 @@ English | [简体中文](README_zh.md)

</div>

HuixiangDou is a **group chat** assistant based on LLM (Large Language Model).
HuixiangDou is a **professional knowledge assistant** based on LLM.

Advantages:

1. Design a three-stage pipeline of preprocess, rejection and response to cope with group chat scenario, answer user questions without message flooding, see [2401.08772](https://arxiv.org/abs/2401.08772), [2405.02817](https://arxiv.org/abs/2405.02817), [Hybrid Retrieval](./docs/knowledge_graph_en.md) and [Precision Report](./evaluation/).
2. No training required, with CPU-only, 2G, 10G and 80G configuration
1. Design three-stage pipelines of preprocess, rejection and response
* `chat_in_group` copes with **group chat** scenario, answer user questions without message flooding, see [2401.08772](https://arxiv.org/abs/2401.08772), [2405.02817](https://arxiv.org/abs/2405.02817), [Hybrid Retrieval](./docs/knowledge_graph_en.md) and [Precision Report](./evaluation/)
* `chat_with_repo` for **real-time streaming** chat
2. No training required, with CPU-only, 2G, 10G, 20G and 80G configuration
3. Offers a complete suite of Web, Android, and pipeline source code, industrial-grade and commercially viable

Check out the [scenes in which HuixiangDou are running](./huixiangdou-inside.md) and join [WeChat Group](resource/figures/wechat.jpg) to try AI assistant inside.
Expand All @@ -46,6 +48,7 @@ If this helps you, please give it a star ⭐

Our Web version has been released to [OpenXLab](https://openxlab.org.cn/apps/detail/tpoisonooo/huixiangdou-web), where you can create knowledge base, update positive and negative examples, turn on web search, test chat, and integrate into Feishu/WeChat groups. See [BiliBili](https://www.bilibili.com/video/BV1S2421N7mn) and [YouTube](https://www.youtube.com/watch?v=ylXrT-Tei-Y) !

- \[2024/08\] `chat_with_repo` [pipeline](./huixiangdou/service/parallel_pipeline.py) 👍
- \[2024/07\] Image and text retrieval & Removal of `langchain` 👍
- \[2024/07\] [Hybrid Knowledge Graph and Dense Retrieval](./docs/knowledge_graph_en.md) improve 1.7% F1 score 🎯
- \[2024/06\] [Evaluation of chunksize, splitter, and text2vec model](./evaluation) 🎯
Expand Down Expand Up @@ -221,7 +224,9 @@ python3 -m huixiangdou.main --standalone
python3 -m huixiangdou.gradio
```

Or run a server to listen 23333:
https://github.com/user-attachments/assets/9e5dbb30-1dc1-42ad-a7d4-dc7380676554

Or run a server to listen 23333, default pipeline is `chat_with_repo`:
```bash
python3 -m huixiangdou.server

Expand Down Expand Up @@ -368,7 +373,7 @@ Contributors have provided [Android tools](./android) to interact with WeChat. T
3. How to access other local LLM / After access, the effect is not ideal?

- Open [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py), add a new LLM inference implementation.
- Refer to [test_intention_prompt and test data](./tests/test_intention_prompt.py), adjust prompt and threshold for the new model, and update them into [worker.py](./huixiangdou/service/worker.py).
- Refer to [test_intention_prompt and test data](./tests/test_intention_prompt.py), adjust prompt and threshold for the new model, and update them into [prompt.py](./huixiangdou/service/prompt.py).

4. What if the response is too slow/request always fails?

Expand Down
19 changes: 13 additions & 6 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,12 @@

</div>

茴香豆是一个基于 LLM **群聊**知识助手,优势:
茴香豆是一个基于 LLM 的专业知识助手,优势:

1. 设计预处理、拒答、响应三阶段 pipeline 应对群聊场景,解答问题同时不会消息泛滥。精髓见 [2401.08772](https://arxiv.org/abs/2401.08772)[2405.02817](https://arxiv.org/abs/2405.02817)[混合检索](./docs/knowledge_graph_zh.md)[业务数据精度测试](./evaluation)
2. 无需训练适用各行业,提供 CPU-only、2G、10G、80G 规格配置
1. 设计预处理、拒答、响应三阶段 pipeline:
* `chat_in_group` 群聊场景,解答问题时不会消息泛滥。见 [2401.08772](https://arxiv.org/abs/2401.08772)[2405.02817](https://arxiv.org/abs/2405.02817)[混合检索](./docs/knowledge_graph_zh.md)[业务数据精度测试](./evaluation)
* `chat_with_repo` 实时聊天场景,响应更快
2. 无需训练适用各行业,提供 CPU-only、2G、10G、20G、80G 规格配置
3. 提供一整套前后端 web、android、算法源码,工业级开源可商用

查看[茴香豆已运行在哪些场景](./huixiangdou-inside.md);加入[微信群](resource/figures/wechat.jpg)直接体验群聊助手效果。
Expand All @@ -45,6 +47,7 @@

Web 版视频教程见 [BiliBili](https://www.bilibili.com/video/BV1S2421N7mn)[YouTube](https://www.youtube.com/watch?v=ylXrT-Tei-Y)

- \[2024/08\] `chat_with_repo` [pipeline](./huixiangdou/service/parallel_pipeline.py)
- \[2024/07\] 图文检索 & 移除 `langchain` 👍
- \[2024/07\] [混合知识图谱和稠密检索,F1 提升 1.7%](./docs/knowledge_graph_zh.md) 🎯
- \[2024/06\] [评估 chunksize,splitter 和 text2vec 模型](./evaluation) 🎯
Expand Down Expand Up @@ -216,10 +219,14 @@ python3 -m huixiangdou.main --standalone
💡 也可以启动 `gradio` 搭建一个简易的 Web UI,默认绑定 7860 端口:

```bash
python3 -m huixiangdou.gradio
python3 -m huixiangdou.gradio
# 若已单独运行 `llm_server_hybrid.py`,可以
# python3 -m huixiangdou.gradio --no-standalone
```

或者启动服务端,监听 23333 端口:
https://github.com/user-attachments/assets/9e5dbb30-1dc1-42ad-a7d4-dc7380676554

或者启动服务端,监听 23333 端口。默认使用 `chat_with_repo` pipeline:
```bash
python3 -m huixiangdou.server

Expand Down Expand Up @@ -364,7 +371,7 @@ python3 tests/test_query_gradio.py
3. 如何接入其他 local LLM / 接入后效果不理想怎么办?

- 打开 [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py),增加新的 LLM 推理实现
- 参照 [test_intention_prompt 和测试数据](./tests/test_intention_prompt.py),针对新模型调整 prompt 和阈值,更新到 [worker.py](./huixiangdou/service/worker.py)
- 参照 [test_intention_prompt 和测试数据](./tests/test_intention_prompt.py),针对新模型调整 prompt 和阈值,更新到 [prompt.py](./huixiangdou/service/prompt.py)

4. 响应太慢/网络请求总是失败怎么办?

Expand Down
2 changes: 1 addition & 1 deletion config.ini
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ engine = "serper"
# For ddgs, see https://pypi.org/project/duckduckgo-search
# For serper, check https://serper.dev/api-key to get a free API key
serper_x_api_key = "YOUR-API-KEY-HERE"
domain_partial_order = ["openai.com", "pytorch.org", "readthedocs.io", "nvidia.com", "stackoverflow.com", "juejin.cn", "zhuanlan.zhihu.com", "www.cnblogs.com"]
domain_partial_order = ["arxiv.org", "openai.com", "pytorch.org", "readthedocs.io", "nvidia.com", "stackoverflow.com", "juejin.cn", "zhuanlan.zhihu.com", "www.cnblogs.com"]
save_dir = "logs/web_search_result"

[llm]
Expand Down
4 changes: 2 additions & 2 deletions docs/full_dev_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,6 @@ The basic version may not perform well. You can enable these features to enhance
It is often unavoidable to adjust parameters with respect to business scenarios.
- Refer to [data.json](./tests/data.json) to add real data, run [test_intention_prompt.py](./tests/test_intention_prompt.py) to get suitable prompts and thresholds, and update them into [worker](./huixiangdou/service/worker.py).
- Adjust the [number of search results](./huixiangdou/service/worker.py) based on the maximum length supported by the model.
- Refer to [data.json](../tests/data.json) to add real data, run [test_intention_prompt.py](../tests/test_intention_prompt.py) to get suitable prompts and thresholds, and update them into [prompt.py](../huixiangdou/service/prompt.py).
- Adjust the [number of search results](../huixiangdou/service/serial_pipeline.py) based on the maximum length supported by the model.
- Update `web_search.domain_partial_order` in `config.ini` according to your scenarios.
4 changes: 2 additions & 2 deletions docs/full_dev_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,6 @@
针对业务场景调参往往不可避免。
- 参照 [data.json](./tests/data.json) 增加真实数据,运行 [test_intention_prompt.py](./tests/test_intention_prompt.py) 得到合适的 prompt 和阈值,更新进 [worker](./huixiangdou/service/worker.py)
- 根据模型支持的最大长度,调整[搜索结果个数](./huixiangdou/service/worker.py)
- 参照 [data.json](../tests/data.json) 增加真实数据,运行 [test_intention_prompt.py](../tests/test_intention_prompt.py) 得到合适的 prompt 和阈值,更新进 [prompt.py](../huixiangdou/service/prompt.py)
- 根据模型支持的最大长度,调整[搜索结果个数](../huixiangdou/service/serial_pipeline.py)
- 按照场景偏好,修改 config.ini 中的 `web_search.domain_partial_order`,即搜索结果偏序
2 changes: 1 addition & 1 deletion huixiangdou/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from .service import FeatureStore # noqa E401
from .service import HybridLLMServer # noqa E401
from .service import WebSearch # noqa E401
from .service import Worker # noqa E401
from .service import SerialPipeline, ParallelPipeline # no E401
from .service import build_reply_text # noqa E401
from .service import llm_serve # noqa E401
from .version import __version__
2 changes: 1 addition & 1 deletion huixiangdou/frontend/wechat.py
Original file line number Diff line number Diff line change
Expand Up @@ -845,7 +845,7 @@ def loop(self, worker):

def parse_args():
"""Parse args."""
parser = argparse.ArgumentParser(description='Worker.')
parser = argparse.ArgumentParser(description='wechat server.')
parser.add_argument('--work_dir',
type=str,
default='workdir',
Expand Down
150 changes: 121 additions & 29 deletions huixiangdou/gradio.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,19 @@
import time
import pdb
from multiprocessing import Process, Value

import asyncio
import cv2
import gradio as gr
import pytoml
from loguru import logger

from typing import List
from huixiangdou.primitive import Query
from huixiangdou.service import ErrorCode, Worker, llm_serve, start_llm_server

from huixiangdou.service import ErrorCode, SerialPipeline, ParallelPipeline, llm_serve, start_llm_server
import json

def parse_args():
"""Parse args."""
parser = argparse.ArgumentParser(description='Worker.')
parser = argparse.ArgumentParser(description='SerialPipeline.')
parser.add_argument('--work_dir',
type=str,
default='workdir',
Expand All @@ -25,7 +25,7 @@ def parse_args():
'--config_path',
default='config.ini',
type=str,
help='Worker configuration path. Default value is config.ini')
help='SerialPipeline configuration path. Default value is config.ini')
parser.add_argument('--standalone',
action='store_true',
default=True,
Expand All @@ -37,50 +37,142 @@ def parse_args():
args = parser.parse_args()
return args

def predict(text, image):
language='en'
enable_web_search=False
pipeline='chat_with_repo'
main_args = None
paralle_assistant = None
serial_assistant = None

def on_language_changed(value:str):
global language
print(value)
language = value

def on_pipeline_changed(value:str):
global pipeline
print(value)
pipeline = value

def on_web_search_changed(value: str):
global enable_web_search
print(value)
if 'no' in value:
enable_web_search = False
else:
enable_web_search = True


def format_refs(refs: List[str]):
refs_filter = list(set(refs))
if len(refs) < 1:
return ''
text = ''
if language == 'zh':
text += '参考资料:\r\n'
else:
text += '**References:**\r\n'

for file_or_url in refs_filter:
text += '* {}\r\n'.format(file_or_url)
text += '\r\n'
return text


async def predict(text:str, image:str):
global language
global enable_web_search
global pipeline
global main_args
global serial_assistant
global paralle_assistant

with open('query.txt', 'a') as f:
f.write(json.dumps({'data': text}))
f.write('\n')

if image is not None:
filename = 'image.png'
image_path = os.path.join(args.work_dir, filename)
cv2.imwrite(image_path, image)
else:
image_path = None

assistant = Worker(work_dir=args.work_dir, config_path=args.config_path)
query = Query(text, image_path)
if 'chat_in_group' in pipeline:
if serial_assistant is None:
serial_assistant = SerialPipeline(work_dir=main_args.work_dir, config_path=main_args.config_path)
args = {'query':query, 'history': [], 'groupname':''}
pipeline = {'status': {}}
debug = dict()
stream_chat_content = ''
for sess in serial_assistant.generate(**args):
if len(sess.delta) > 0:
# start chat, display
stream_chat_content += sess.delta
yield stream_chat_content
else:
status = {
"state":str(sess.code),
"response": sess.response,
"refs": sess.references
}
pipeline['status'] = status
pipeline['debug'] = sess.debug

json_str = json.dumps(pipeline, indent=2, ensure_ascii=False)
yield json_str

pipeline = {'step': []}
debug = dict()
for sess in assistant.generate(query=query, history=[], groupname=''):
status = {
"state":str(sess.code),
"response": sess.response,
"refs": sess.references
}
else:
if paralle_assistant is None:
paralle_assistant = ParallelPipeline(work_dir=main_args.work_dir, config_path=main_args.config_path)
args = {'query':query, 'history':[], 'language':language}
args['enable_web_search'] = enable_web_search

print(status)
pipeline['step'].append(status)
pipeline['debug'] = sess.debug
sentence = ''
async for sess in paralle_assistant.generate(**args):
if sentence == '' and len(sess.references) > 0:
sentence = format_refs(sess.references)

json_str = json.dumps(pipeline, indent=2, ensure_ascii=False)
yield json_str
if len(sess.delta) > 0:
sentence += sess.delta
yield sentence

yield sentence

if __name__ == '__main__':
args = parse_args()
main_args = parse_args()

# start service
if args.standalone is True:
if main_args.standalone is True:
# hybrid llm serve
start_llm_server(config_path=args.config_path)
start_llm_server(config_path=main_args.config_path)

with gr.Blocks() as demo:
with gr.Blocks(theme=gr.themes.Soft(), title='HuixiangDou AI assistant', analytics_enabled=True) as demo:
with gr.Row():
gr.Markdown("""
#### [HuixiangDou](https://github.com/internlm/huixiangdou) AI assistant
""", label='Reply', header_links=True, line_breaks=True,)
with gr.Row():
input_question = gr.TextArea(label='Input the question.')
input_image = gr.Image(label='Upload Image.')
with gr.Column():
ui_pipeline = gr.Radio(["chat_with_repo", "chat_in_group"], label="Pipeline type", info="Group-chat is slow but accurate and safe, default value is `chat_with_repo`")
ui_pipeline.change(fn=on_pipeline_changed, inputs=ui_pipeline, outputs=[])
with gr.Column():
ui_language = gr.Radio(["en", "zh"], label="Language", info="Use `en` by default ")
ui_language.change(fn=on_language_changed, inputs=ui_language, outputs=[])
with gr.Column():
ui_web_search = gr.Radio(["no", "yes"], label="Enable web search", info="Disable by default ")
ui_web_search.change(on_web_search_changed, inputs=ui_web_search, outputs=[])

with gr.Row():
input_question = gr.TextArea(label='Input your question', placeholder='how to install mmpose ?', show_copy_button=True, lines=9)
input_image = gr.Image(label='[Optional] Image-text retrieval needs `config-multimodal.ini`')
with gr.Row():
run_button = gr.Button()
with gr.Row():
result = gr.TextArea(label='HuixiangDou pipline status', show_copy_button=True)
result = gr.Markdown('>Text reply or inner status callback here, depends on `pipeline type`', label='Reply', show_label=True, header_links=True, line_breaks=True, show_copy_button=True)
# result = gr.TextArea(label='Reply', show_copy_button=True, placeholder='Text Reply or inner status callback, depends on `pipeline type`')

run_button.click(predict, [input_question, input_image], [result])

demo.queue()
demo.launch(share=False, server_name='0.0.0.0', debug=True)
11 changes: 4 additions & 7 deletions huixiangdou/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@
from loguru import logger
from termcolor import colored

from .service import ErrorCode, Worker, build_reply_text, start_llm_server
from .service import ErrorCode, SerialPipeline, build_reply_text, start_llm_server


def parse_args():
"""Parse args."""
parser = argparse.ArgumentParser(description='Worker.')
parser = argparse.ArgumentParser(description='SerialPipeline.')
parser.add_argument('--work_dir',
type=str,
default='workdir',
Expand All @@ -25,7 +25,7 @@ def parse_args():
'--config_path',
default='config.ini',
type=str,
help='Worker configuration path. Default value is config.ini')
help='SerialPipeline configuration path. Default value is config.ini')
parser.add_argument('--standalone',
action='store_true',
default=False,
Expand Down Expand Up @@ -191,7 +191,7 @@ def run():
with open(args.config_path, encoding='utf8') as f:
fe_config = pytoml.load(f)['frontend']
logger.info('Config loaded.')
assistant = Worker(work_dir=args.work_dir, config_path=args.config_path)
assistant = SerialPipeline(work_dir=args.work_dir, config_path=args.config_path)

fe_type = fe_config['type']
if fe_type == 'none':
Expand All @@ -209,8 +209,5 @@ def run():
f'unsupported fe_config.type {fe_type}, please read `config.ini` description.' # noqa E501
)

# server_process.join()


if __name__ == '__main__':
run()
Loading

0 comments on commit 87a10e1

Please sign in to comment.