Skip to content

Commit

Permalink
Merge pull request #1 from InternLM/build_whl
Browse files Browse the repository at this point in the history
feat(setup.py): support build whl package
  • Loading branch information
tpoisonooo authored Jan 14, 2024
2 parents f529b33 + 47733cf commit 4420411
Show file tree
Hide file tree
Showing 34 changed files with 301 additions and 69 deletions.
13 changes: 10 additions & 3 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -1,17 +1,24 @@
name: lint

on: [push, pull_request]
on:
push:
branches:
- main
pull_request:

jobs:
lint:
runs-on: ubuntu-20.04
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.8
- name: Set up Python 3.9
uses: actions/setup-python@v2
with:
python-version: 3.8
python-version: 3.9
- name: Check doc link
run: |
python .github/scripts/doc_link_checker.py --target README_en.md
python .github/scripts/doc_link_checker.py --target README.md
python -m pip install pylint interrogate
pylint huixiangdou || true
interrogate huixiangdou -v || true
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,6 @@ badcase.txt
config.bak
config.ini
resource/prompt.txt
build/
dist/
huixiangdou.egg-info/
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,4 @@ repos:
rev: v0.4.1
hooks:
- id: check-copyright
args: ["service"]
args: ["huixiangdou"]
25 changes: 13 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ git clone https://github.com/internlm/lmdeploy --depth=1 repodir/lmdeploy
# Build a feature store
mkdir workdir # create a working directory
python3 -m pip install -r requirements.txt # install dependencies, python3.11 needs `conda install conda-forge::faiss-gpu`
python3 service/feature_store.py # save the features of repodir to workdir
python3 -m huixiangdou.service.feature_store # save the features of repodir to workdir
```

The first run will automatically download the configuration of [text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese), you can also manually download it and update model path in `config.ini`.
Expand Down Expand Up @@ -86,7 +86,7 @@ The first run will automatically download the configuration of internlm2-7B.

```shell
# standalone
python3 main.py --standalone
python3 -m huixiangdou.main --standalone
..
ErrorCode.SUCCESS,
Query: Could you please advise if there is any good optimization method for video stream detection flickering caused by frame skipping?
Expand All @@ -100,7 +100,7 @@ The first run will automatically download the configuration of internlm2-7B.
```shell
# Start LLM service
python3 service/llm_server_hybride.py
python3 -m huixiangdou.service.llm_server_hybrid
```
Open a new terminal, configure the host IP (**not** container IP) in `config.ini`, run
Expand All @@ -111,7 +111,7 @@ The first run will automatically download the configuration of internlm2-7B.
..
client_url = "http://10.140.24.142:8888/inference" # example
python3 main.py
python3 -m huixiangdou.main
```
## STEP3. Integrate into Feishu \[Optional\]
Expand All @@ -129,7 +129,8 @@ webhook_url = "${YOUR-LARK-WEBHOOK-URL}"
Run. After it ends, the technical assistant's reply will be sent to the Feishu group chat.
```shell
python3 main.py
python3 -m huixiangdou.main --standalone # for non-docker users
python3 -m huixiangdou.main # for docker users
```
<img src="./resource/figures/lark-example.png" width="400">
Expand Down Expand Up @@ -196,10 +197,10 @@ The basic version may not perform well. You can enable these features to enhance
introduction = "Used for evaluating large language models (LLM) .."
```
- Use `python3 -m service.sg_search` for unit test, the returned content should include opencompass source code and documentation
- Use `python3 -m huixiangdou.service.sg_search` for unit test, the returned content should include opencompass source code and documentation
```shell
python3 service/sg_search.py
python3 -m huixiangdou.service.sg_search
..
"filepath": "opencompass/datasets/longbench/longbench_trivia_qa.py",
"content": "from datasets import Dataset..
Expand All @@ -211,8 +212,8 @@ The basic version may not perform well. You can enable these features to enhance
It is often unavoidable to adjust parameters with respect to business scenarios.
- Refer to [data.json](./tests/data.json) to add real data, run [test_intention_prompt.py](./tests/test_intention_prompt.py) to get suitable prompts and thresholds, and update them into [worker](./service/worker.py).
- Adjust the [number of search results](./service/worker.py) based on the maximum length supported by the model.
- Refer to [data.json](./tests/data.json) to add real data, run [test_intention_prompt.py](./tests/test_intention_prompt.py) to get suitable prompts and thresholds, and update them into [worker](./huixiangdou/service/worker.py).
- Adjust the [number of search results](./huixiangdou/service/worker.py) based on the maximum length supported by the model.
# 🛠️ FAQ
Expand All @@ -234,12 +235,12 @@ The basic version may not perform well. You can enable these features to enhance
4. How to access other local LLM / After access, the effect is not ideal?
- Open [hybrid llm service](./service/llm_server_hybrid.py), add a new LLM inference implementation.
- Refer to [test_intention_prompt and test data](./tests/test_intention_prompt.py), adjust prompt and threshold for the new model, and update them into [worker.py](./service/worker.py).
- Open [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py), add a new LLM inference implementation.
- Refer to [test_intention_prompt and test data](./tests/test_intention_prompt.py), adjust prompt and threshold for the new model, and update them into [worker.py](./huixiangdou/service/worker.py).
5. What if the response is too slow/request always fails?
- Refer to [hybrid llm service](./service/llm_server_hybrid.py) to add exponential backoff and retransmission.
- Refer to [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py) to add exponential backoff and retransmission.
- Replace local LLM with an inference framework such as [lmdeploy](https://github.com/internlm/lmdeploy), instead of the native huggingface/transformers.
6. What if the GPU memory is too low?
Expand Down
27 changes: 14 additions & 13 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,8 @@ git clone https://github.com/internlm/lmdeploy --depth=1 repodir/lmdeploy

# 建立特征库
mkdir workdir # 创建工作目录
python3 -m pip install -r requirements.txt # 安装依赖,python3.11 需要 `conda install conda-forge::faiss-gpu`
python3 service/feature_store.py # 把 repodir 的特征保存到 workdir
python3 -m pip install -r requirements.txt # 安装依赖,python3.11 则需要 `conda install conda-forge::faiss-gpu`
python3 -m huixiangdou.service.feature_store # 把 repodir 的特征保存到 workdir
```

首次运行将自动下载配置中的 [text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese)。考虑到不同地区 huggingface 连接问题,建议先手动下载到本地,然后在 `config.ini` 设置模型路径。例如:
Expand Down Expand Up @@ -93,7 +93,7 @@ x_api_key = "${YOUR-X-API-KEY}"

```shell
# standalone
python3 main.py --standalone
python3 -m huixiangdou.main --standalone
..
ErrorCode.SUCCESS,
Query: 请教下视频流检测 跳帧 造成框一闪一闪的 有好的优化办法吗
Expand All @@ -107,7 +107,7 @@ x_api_key = "${YOUR-X-API-KEY}"

```shell
# 启动 LLM 服务
python3 service/llm_server_hybrid.py
python3 -m huixiangdou.service.llm_server_hybrid
```

打开新终端,把 host IP (注意不是 docker 容器内的 IP) 配置进 `config.ini`,运行
Expand All @@ -118,7 +118,7 @@ x_api_key = "${YOUR-X-API-KEY}"
..
client_url = "http://10.140.24.142:9999/inference" # 举例

python3 main.py
python3 -m huixiangdou.main
```

## STEP3.集成到飞书\[可选\]
Expand All @@ -136,7 +136,8 @@ webhook_url = "${YOUR-LARK-WEBHOOK-URL}"
运行。结束后,技术助手的答复将发送到飞书群。

```shell
python3 main.py
python3 -m huixiangdou.main --standalone # 非 docker 用户
python3 -m huixiangdou.main # docker 用户
```

<img src="./resource/figures/lark-example.png" width="400">
Expand Down Expand Up @@ -203,10 +204,10 @@ python3 main.py
introduction = "用于评测大型语言模型(LLM).."
```

- 使用 `python3 -m service.sg_search` 单测,返回内容应包含 opencompass 源码和文档
- 使用 `python3 -m huixiangdou.service.sg_search` 单测,返回内容应包含 opencompass 源码和文档

```shell
python3 service/sg_search.py
python3 -m huixiangdou.service.sg_search
..
"filepath": "opencompass/datasets/longbench/longbench_trivia_qa.py",
"content": "from datasets import Dataset..
Expand All @@ -218,8 +219,8 @@ python3 main.py
针对业务场景调参往往不可避免。
- 参照 [data.json](./tests/data.json) 增加真实数据,运行 [test_intention_prompt.py](./tests/test_intention_prompt.py) 得到合适的 prompt 和阈值,更新进 [worker](./service/worker.py)
- 根据模型支持的最大长度,调整[搜索结果个数](./service/worker.py)
- 参照 [data.json](./tests/data.json) 增加真实数据,运行 [test_intention_prompt.py](./tests/test_intention_prompt.py) 得到合适的 prompt 和阈值,更新进 [worker](./huixiangdou/service/worker.py)
- 根据模型支持的最大长度,调整[搜索结果个数](./huixiangdou/service/worker.py)
# 🛠️ FAQ
Expand All @@ -241,12 +242,12 @@ python3 main.py
4. 如何接入其他 local LLM/ 接入后效果不理想怎么办?
- 打开 [hybrid llm service](./service/llm_server_hybrid.py),增加新的 LLM 推理实现
- 参照 [test_intention_prompt 和测试数据](./tests/test_intention_prompt.py),针对新模型调整 prompt 和阈值,更新到 [worker.py](./service/worker.py)
- 打开 [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py),增加新的 LLM 推理实现
- 参照 [test_intention_prompt 和测试数据](./tests/test_intention_prompt.py),针对新模型调整 prompt 和阈值,更新到 [worker.py](./huixiangdou/service/worker.py)
5. 响应太慢/网络请求总是失败怎么办?
- 参考 [hybrid llm service](./service/llm_server_hybrid.py) 增加指数退避重传
- 参考 [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py) 增加指数退避重传
- local LLM 替换为 [lmdeploy](https://github.com/internlm/lmdeploy) 等推理框架,而非原生的 huggingface/transformers
6. GPU 显存太低怎么办?
Expand Down
10 changes: 10 additions & 0 deletions huixiangdou/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Copyright (c) OpenMMLab. All rights reserved.
"""import module."""
from .frontend import Lark # noqa E401
from .service import ChatClient # noqa E401
from .service import ErrorCode # noqa E401
from .service import FeatureStore # noqa E401
from .service import HybridLLMServer # noqa E401
from .service import WebSearch # noqa E401
from .service import Worker # noqa E401
from .service import llm_serve # noqa E401
Binary file added huixiangdou/__pycache__/__init__.cpython-39.pyc
Binary file not shown.
Binary file added huixiangdou/__pycache__/main.cpython-39.pyc
Binary file not shown.
File renamed without changes.
Binary file not shown.
Binary file not shown.
7 changes: 4 additions & 3 deletions frontend/lark.py → huixiangdou/frontend/lark.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Copyright (c) OpenMMLab. All rights reserved.
# copy from https://github.com/tpoisonooo/cpp-syntactic-sugar/blob/master/github-lark-notifier/main.py # noqa E501
"""Lark proxy."""
import json
import logging
Expand All @@ -9,7 +11,6 @@
urllib3.disable_warnings()


# copy from https://github.com/tpoisonooo/cpp-syntactic-sugar/blob/master/github-lark-notifier/main.py # noqa E501
class Lark:
"""Lark bot http proxy."""

Expand Down Expand Up @@ -52,7 +53,7 @@ def post(self, data):
headers=self.headers,
data=post_data,
verify=False,
timeout=3)
timeout=5)
except requests.exceptions.HTTPError as exc:
code = exc.response.status_code
reason = exc.response.reason
Expand Down Expand Up @@ -95,5 +96,5 @@ def post(self, data):
requests.post(self.webhook,
headers=self.headers,
data=json.dumps(error_data),
timeout=3)
timeout=5)
return result
39 changes: 35 additions & 4 deletions main.py → huixiangdou/main.py
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,15 +1,21 @@
#!/usr/bin/env python3
# Copyright (c) OpenMMLab. All rights reserved.
"""HuixiangDou binary."""
import argparse
import os
import time
from multiprocessing import Process, Value

import pytoml
import requests
from loguru import logger

from frontend import Lark
from service import ErrorCode, Worker, llm_serve
from .frontend import Lark
from .service import ErrorCode, Worker, llm_serve


def parse_args():
"""Parse args."""
parser = argparse.ArgumentParser(description='Worker.')
parser.add_argument('--work_dir',
type=str,
Expand All @@ -28,9 +34,29 @@ def parse_args():
return args


if __name__ == '__main__':
args = parse_args()
def check_env():
"""Check or create config.ini and logs dir."""
if not os.path.exists('logs'):
os.makedirs('logs')
CONFIG_NAME = 'config.ini'
CONFIG_URL = 'https://raw.githubusercontent.com/InternLM/HuixiangDou/main/config.ini?token=GHSAT0AAAAAACK2GCUVNSQXR373FEGSZSIIZNDZBMQ' # noqa E501
if not os.path.exists(CONFIG_NAME):
logger.warning(
f'{CONFIG_NAME} not found, download a template from {CONFIG_URL}.')

try:
response = requests.get(CONFIG_URL, timeout=5)
response.raise_for_status()
with open(CONFIG_NAME, 'wb') as f:
f.write(response.content)
except Exception as e:
logger.error(f'Failed to download file due to {e}')


def run():
"""Automatically download config, start llm server and run examples."""
check_env()
args = parse_args()
if args.standalone:
# hybrid llm serve
server_ready = Value('i', 0)
Expand All @@ -52,6 +78,7 @@ def parse_args():
# query by worker
with open(args.config_path, encoding='utf8') as f:
fe_config = pytoml.load(f)['frontend']
logger.info('Config loaded.')
assistant = Worker(work_dir=args.work_dir, config_path=args.config_path)
# queries = ['请教下视频流检测 跳帧 造成框一闪一闪的 有好的优化办法吗',
# '请教各位佬一个问题,虽然说注意力的长度等于上下文的长度。但是,增大上下文推理长度难道只有加长注意力机制一种方法吗?比如Rope啥的,应该不是吧', # noqa E501
Expand All @@ -68,3 +95,7 @@ def parse_args():
lark.send_text(msg=reply)

# server_process.join()


if __name__ == '__main__':
run()
File renamed without changes.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
File renamed without changes.
File renamed without changes.
5 changes: 4 additions & 1 deletion service/llm_client.py → huixiangdou/service/llm_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,10 @@ def generate_response(self, prompt, history=[], remote=False):
'history': data_history,
'remote': remote
}
resp = requests.post(url, headers=header, data=json.dumps(data))
resp = requests.post(url,
headers=header,
data=json.dumps(data),
timeout=5)
if resp.status_code != 200:
raise Exception(str((resp.status_code, resp.reason)))
return resp.json()['text']
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
from transformers import AutoModelForCausalLM, AutoTokenizer


class HybridLLMServer(object):
class HybridLLMServer:
"""A class to handle server-side interactions with a hybrid language
learning model (LLM) service.
Expand Down
2 changes: 1 addition & 1 deletion service/sg_search.py → huixiangdou/service/sg_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ def choose_repo(self, llm_client, question, groupname):

keys = self.sg_config.keys()
skip = ['binary_src_path', 'src_access_token']
repos = dict()
repos = {}
for key in keys:
if key in skip:
continue
Expand Down
Loading

0 comments on commit 4420411

Please sign in to comment.