Merge pull request #1 from InternLM/build_whl

feat(setup.py): support build whl package
InternLM · Jan 14, 2024 · 4420411 · 4420411
2 parents f529b33 + 47733cf
commit 4420411
Show file tree

Hide file tree

Showing 34 changed files with 301 additions and 69 deletions.
diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml
@@ -1,17 +1,24 @@
 name: lint
 
-on: [push, pull_request]
+on:
+  push:
+    branches:
+      - main
+  pull_request:
 
 jobs:
   lint:
     runs-on: ubuntu-20.04
     steps:
       - uses: actions/checkout@v2
-      - name: Set up Python 3.8
+      - name: Set up Python 3.9
         uses: actions/setup-python@v2
         with:
-          python-version: 3.8
+          python-version: 3.9
       - name: Check doc link
         run: |
           python .github/scripts/doc_link_checker.py --target README_en.md
           python .github/scripts/doc_link_checker.py --target README.md
+          python -m pip install pylint interrogate
+          pylint huixiangdou || true
+          interrogate huixiangdou -v || true
diff --git a/.gitignore b/.gitignore
@@ -14,3 +14,6 @@ badcase.txt
 config.bak
 config.ini
 resource/prompt.txt
+build/
+dist/
+huixiangdou.egg-info/
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -57,4 +57,4 @@ repos:
     rev: v0.4.1
     hooks:
     -   id: check-copyright
-        args: ["service"]
+        args: ["huixiangdou"]
diff --git a/README.md b/README.md
@@ -46,7 +46,7 @@ git clone https://github.com/internlm/lmdeploy --depth=1 repodir/lmdeploy
 # Build a feature store
 mkdir workdir # create a working directory
 python3 -m pip install -r requirements.txt # install dependencies, python3.11 needs `conda install conda-forge::faiss-gpu`
-python3 service/feature_store.py # save the features of repodir to workdir
+python3 -m huixiangdou.service.feature_store # save the features of repodir to workdir
 ```
 
 The first run will automatically download the configuration of [text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese), you can also manually download it and update model path in `config.ini`.
@@ -86,7 +86,7 @@ The first run will automatically download the configuration of internlm2-7B.
 
   ```shell
   # standalone
-  python3 main.py --standalone
+  python3 -m huixiangdou.main --standalone
   ..
   ErrorCode.SUCCESS,
   Query: Could you please advise if there is any good optimization method for video stream detection flickering caused by frame skipping?
@@ -100,7 +100,7 @@ The first run will automatically download the configuration of internlm2-7B.
 
   ```shell
   # Start LLM service
-  python3 service/llm_server_hybride.py
+  python3 -m huixiangdou.service.llm_server_hybrid
   ```
 
   Open a new terminal, configure the host IP (**not** container IP) in `config.ini`, run
@@ -111,7 +111,7 @@ The first run will automatically download the configuration of internlm2-7B.
   ..
   client_url = "http://10.140.24.142:8888/inference" # example
 
-  python3 main.py
+  python3 -m huixiangdou.main
   ```
 
 ## STEP3. Integrate into Feishu \[Optional\]
@@ -129,7 +129,8 @@ webhook_url = "${YOUR-LARK-WEBHOOK-URL}"
 Run. After it ends, the technical assistant's reply will be sent to the Feishu group chat.
 
 ```shell
-python3 main.py
+python3 -m huixiangdou.main --standalone # for non-docker users
+python3 -m huixiangdou.main # for docker users
 ```
 
 <img src="./resource/figures/lark-example.png" width="400">
@@ -196,10 +197,10 @@ The basic version may not perform well. You can enable these features to enhance
      introduction = "Used for evaluating large language models (LLM) .."
      ```
 
-   - Use `python3 -m service.sg_search` for unit test, the returned content should include opencompass source code and documentation
+   - Use `python3 -m huixiangdou.service.sg_search` for unit test, the returned content should include opencompass source code and documentation
 
      ```shell
-     python3 service/sg_search.py
+     python3 -m huixiangdou.service.sg_search
      ..
      "filepath": "opencompass/datasets/longbench/longbench_trivia_qa.py",
      "content": "from datasets import Dataset..
@@ -211,8 +212,8 @@ The basic version may not perform well. You can enable these features to enhance
 
    It is often unavoidable to adjust parameters with respect to business scenarios.
 
-   - Refer to [data.json](./tests/data.json) to add real data, run [test_intention_prompt.py](./tests/test_intention_prompt.py) to get suitable prompts and thresholds, and update them into [worker](./service/worker.py).
-   - Adjust the [number of search results](./service/worker.py) based on the maximum length supported by the model.
+   - Refer to [data.json](./tests/data.json) to add real data, run [test_intention_prompt.py](./tests/test_intention_prompt.py) to get suitable prompts and thresholds, and update them into [worker](./huixiangdou/service/worker.py).
+   - Adjust the [number of search results](./huixiangdou/service/worker.py) based on the maximum length supported by the model.
 
 # 🛠️ FAQ
 
@@ -234,12 +235,12 @@ The basic version may not perform well. You can enable these features to enhance
 
 4. How to access other local LLM / After access, the effect is not ideal?
 
-   - Open [hybrid llm service](./service/llm_server_hybrid.py), add a new LLM inference implementation.
-   - Refer to [test_intention_prompt and test data](./tests/test_intention_prompt.py), adjust prompt and threshold for the new model, and update them into [worker.py](./service/worker.py).
+   - Open [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py), add a new LLM inference implementation.
+   - Refer to [test_intention_prompt and test data](./tests/test_intention_prompt.py), adjust prompt and threshold for the new model, and update them into [worker.py](./huixiangdou/service/worker.py).
 
 5. What if the response is too slow/request always fails?
 
-   - Refer to [hybrid llm service](./service/llm_server_hybrid.py) to add exponential backoff and retransmission.
+   - Refer to [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py) to add exponential backoff and retransmission.
    - Replace local LLM with an inference framework such as [lmdeploy](https://github.com/internlm/lmdeploy), instead of the native huggingface/transformers.
 
 6. What if the GPU memory is too low?

diff --git a/README_zh.md b/README_zh.md
@@ -45,8 +45,8 @@ git clone https://github.com/internlm/lmdeploy --depth=1 repodir/lmdeploy
 
 # 建立特征库
 mkdir workdir # 创建工作目录
-python3 -m pip install -r requirements.txt # 安装依赖，python3.11 需要 `conda install conda-forge::faiss-gpu`
-python3 service/feature_store.py # 把 repodir 的特征保存到 workdir
+python3 -m pip install -r requirements.txt # 安装依赖，若 python3.11 则需要 `conda install conda-forge::faiss-gpu`
+python3 -m huixiangdou.service.feature_store # 把 repodir 的特征保存到 workdir
 ```
 
 首次运行将自动下载配置中的 [text2vec-large-chinese](https://huggingface.co/GanymedeNil/text2vec-large-chinese)。考虑到不同地区 huggingface 连接问题，建议先手动下载到本地，然后在 `config.ini` 设置模型路径。例如：
@@ -93,7 +93,7 @@ x_api_key = "${YOUR-X-API-KEY}"
 
   ```shell
   # standalone
-  python3 main.py --standalone
+  python3 -m huixiangdou.main --standalone
   ..
   ErrorCode.SUCCESS,
   Query: 请教下视频流检测 跳帧  造成框一闪一闪的  有好的优化办法吗
@@ -107,7 +107,7 @@ x_api_key = "${YOUR-X-API-KEY}"
 
   ```shell
   # 启动 LLM 服务
-  python3 service/llm_server_hybrid.py
+  python3 -m huixiangdou.service.llm_server_hybrid
   ```
 
   打开新终端，把 host IP (注意不是 docker 容器内的 IP) 配置进 `config.ini`，运行
@@ -118,7 +118,7 @@ x_api_key = "${YOUR-X-API-KEY}"
   ..
   client_url = "http://10.140.24.142:9999/inference" # 举例
 
-  python3 main.py
+  python3 -m huixiangdou.main
   ```
 
 ## STEP3.集成到飞书\[可选\]
@@ -136,7 +136,8 @@ webhook_url = "${YOUR-LARK-WEBHOOK-URL}"
 运行。结束后，技术助手的答复将发送到飞书群。
 
 ```shell
-python3 main.py
+python3 -m huixiangdou.main --standalone # 非 docker 用户
+python3 -m huixiangdou.main # docker 用户
 ```
 
 <img src="./resource/figures/lark-example.png" width="400">
@@ -203,10 +204,10 @@ python3 main.py
      introduction = "用于评测大型语言模型（LLM）.."
      ```
 
-   - 使用 `python3 -m service.sg_search` 单测，返回内容应包含 opencompass 源码和文档
+   - 使用 `python3 -m huixiangdou.service.sg_search` 单测，返回内容应包含 opencompass 源码和文档
 
      ```shell
-     python3 service/sg_search.py
+     python3 -m huixiangdou.service.sg_search
      ..
      "filepath": "opencompass/datasets/longbench/longbench_trivia_qa.py",
      "content": "from datasets import Dataset..
@@ -218,8 +219,8 @@ python3 main.py
 
    针对业务场景调参往往不可避免。
 
-   - 参照 [data.json](./tests/data.json) 增加真实数据，运行 [test_intention_prompt.py](./tests/test_intention_prompt.py) 得到合适的 prompt 和阈值，更新进 [worker](./service/worker.py)
-   - 根据模型支持的最大长度，调整[搜索结果个数](./service/worker.py)
+   - 参照 [data.json](./tests/data.json) 增加真实数据，运行 [test_intention_prompt.py](./tests/test_intention_prompt.py) 得到合适的 prompt 和阈值，更新进 [worker](./huixiangdou/service/worker.py)
+   - 根据模型支持的最大长度，调整[搜索结果个数](./huixiangdou/service/worker.py)
 
 # 🛠️ FAQ
 
@@ -241,12 +242,12 @@ python3 main.py
 
 4. 如何接入其他 local LLM/ 接入后效果不理想怎么办？
 
-   - 打开 [hybrid llm service](./service/llm_server_hybrid.py)，增加新的 LLM 推理实现
-   - 参照 [test_intention_prompt 和测试数据](./tests/test_intention_prompt.py)，针对新模型调整 prompt 和阈值，更新到 [worker.py](./service/worker.py)
+   - 打开 [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py)，增加新的 LLM 推理实现
+   - 参照 [test_intention_prompt 和测试数据](./tests/test_intention_prompt.py)，针对新模型调整 prompt 和阈值，更新到 [worker.py](./huixiangdou/service/worker.py)
 
 5. 响应太慢/网络请求总是失败怎么办？
 
-   - 参考 [hybrid llm service](./service/llm_server_hybrid.py) 增加指数退避重传
+   - 参考 [hybrid llm service](./huixiangdou/service/llm_server_hybrid.py) 增加指数退避重传
    - local LLM 替换为 [lmdeploy](https://github.com/internlm/lmdeploy) 等推理框架，而非原生的 huggingface/transformers
 
 6. GPU 显存太低怎么办？

diff --git a/huixiangdou/__init__.py b/huixiangdou/__init__.py
@@ -0,0 +1,10 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""import module."""
+from .frontend import Lark  # noqa E401
+from .service import ChatClient  # noqa E401
+from .service import ErrorCode  # noqa E401
+from .service import FeatureStore  # noqa E401
+from .service import HybridLLMServer  # noqa E401
+from .service import WebSearch  # noqa E401
+from .service import Worker  # noqa E401
+from .service import llm_serve  # noqa E401
diff --git a/huixiangdou/__pycache__/__init__.cpython-39.pyc b/huixiangdou/__pycache__/__init__.cpython-39.pyc
diff --git a/huixiangdou/__pycache__/main.cpython-39.pyc b/huixiangdou/__pycache__/main.cpython-39.pyc
diff --git a/frontend/__init__.py → huixiangdou/frontend/__init__.py b/frontend/__init__.py → huixiangdou/frontend/__init__.py
diff --git a/huixiangdou/frontend/__pycache__/__init__.cpython-39.pyc b/huixiangdou/frontend/__pycache__/__init__.cpython-39.pyc
diff --git a/huixiangdou/frontend/__pycache__/lark.cpython-39.pyc b/huixiangdou/frontend/__pycache__/lark.cpython-39.pyc
diff --git a/frontend/lark.py → huixiangdou/frontend/lark.py b/frontend/lark.py → huixiangdou/frontend/lark.py
@@ -1,3 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+# copy from https://github.com/tpoisonooo/cpp-syntactic-sugar/blob/master/github-lark-notifier/main.py  # noqa E501
 """Lark proxy."""
 import json
 import logging
@@ -9,7 +11,6 @@
 urllib3.disable_warnings()
 
 
-# copy from https://github.com/tpoisonooo/cpp-syntactic-sugar/blob/master/github-lark-notifier/main.py  # noqa E501
 class Lark:
     """Lark bot http proxy."""
 
@@ -52,7 +53,7 @@ def post(self, data):
                                      headers=self.headers,
                                      data=post_data,
                                      verify=False,
-                                     timeout=3)
+                                     timeout=5)
         except requests.exceptions.HTTPError as exc:
             code = exc.response.status_code
             reason = exc.response.reason
@@ -95,5 +96,5 @@ def post(self, data):
             requests.post(self.webhook,
                           headers=self.headers,
                           data=json.dumps(error_data),
-                          timeout=3)
+                          timeout=5)
         return result
diff --git a/main.py → huixiangdou/main.py b/main.py → huixiangdou/main.py
@@ -1,15 +1,21 @@
+#!/usr/bin/env python3
+# Copyright (c) OpenMMLab. All rights reserved.
+"""HuixiangDou binary."""
 import argparse
+import os
 import time
 from multiprocessing import Process, Value
 
 import pytoml
+import requests
 from loguru import logger
 
-from frontend import Lark
-from service import ErrorCode, Worker, llm_serve
+from .frontend import Lark
+from .service import ErrorCode, Worker, llm_serve
 
 
 def parse_args():
+    """Parse args."""
     parser = argparse.ArgumentParser(description='Worker.')
     parser.add_argument('--work_dir',
                         type=str,
@@ -28,9 +34,29 @@ def parse_args():
     return args
 
 
-if __name__ == '__main__':
-    args = parse_args()
+def check_env():
+    """Check or create config.ini and logs dir."""
+    if not os.path.exists('logs'):
+        os.makedirs('logs')
+    CONFIG_NAME = 'config.ini'
+    CONFIG_URL = 'https://raw.githubusercontent.com/InternLM/HuixiangDou/main/config.ini?token=GHSAT0AAAAAACK2GCUVNSQXR373FEGSZSIIZNDZBMQ'  # noqa E501
+    if not os.path.exists(CONFIG_NAME):
+        logger.warning(
+            f'{CONFIG_NAME} not found, download a template from {CONFIG_URL}.')
+
+        try:
+            response = requests.get(CONFIG_URL, timeout=5)
+            response.raise_for_status()
+            with open(CONFIG_NAME, 'wb') as f:
+                f.write(response.content)
+        except Exception as e:
+            logger.error(f'Failed to download file due to {e}')
+
 
+def run():
+    """Automatically download config, start llm server and run examples."""
+    check_env()
+    args = parse_args()
     if args.standalone:
         # hybrid llm serve
         server_ready = Value('i', 0)
@@ -52,6 +78,7 @@ def parse_args():
     # query by worker
     with open(args.config_path, encoding='utf8') as f:
         fe_config = pytoml.load(f)['frontend']
+    logger.info('Config loaded.')
     assistant = Worker(work_dir=args.work_dir, config_path=args.config_path)
     # queries = ['请教下视频流检测 跳帧  造成框一闪一闪的  有好的优化办法吗',
     #    '请教各位佬一个问题，虽然说注意力的长度等于上下文的长度。但是，增大上下文推理长度难道只有加长注意力机制一种方法吗？比如Rope啥的，应该不是吧',   # noqa E501
@@ -68,3 +95,7 @@ def parse_args():
             lark.send_text(msg=reply)
 
     # server_process.join()
+
+
+if __name__ == '__main__':
+    run()
diff --git a/service/__init__.py → huixiangdou/service/__init__.py b/service/__init__.py → huixiangdou/service/__init__.py
diff --git a/huixiangdou/service/__pycache__/__init__.cpython-39.pyc b/huixiangdou/service/__pycache__/__init__.cpython-39.pyc
diff --git a/huixiangdou/service/__pycache__/feature_store.cpython-39.pyc b/huixiangdou/service/__pycache__/feature_store.cpython-39.pyc
diff --git a/huixiangdou/service/__pycache__/helper.cpython-39.pyc b/huixiangdou/service/__pycache__/helper.cpython-39.pyc
diff --git a/huixiangdou/service/__pycache__/llm_client.cpython-39.pyc b/huixiangdou/service/__pycache__/llm_client.cpython-39.pyc
diff --git a/huixiangdou/service/__pycache__/llm_server_hybrid.cpython-39.pyc b/huixiangdou/service/__pycache__/llm_server_hybrid.cpython-39.pyc
diff --git a/huixiangdou/service/__pycache__/sg_search.cpython-39.pyc b/huixiangdou/service/__pycache__/sg_search.cpython-39.pyc
diff --git a/huixiangdou/service/__pycache__/web_search.cpython-39.pyc b/huixiangdou/service/__pycache__/web_search.cpython-39.pyc
diff --git a/huixiangdou/service/__pycache__/worker.cpython-39.pyc b/huixiangdou/service/__pycache__/worker.cpython-39.pyc
diff --git a/service/feature_store.py → huixiangdou/service/feature_store.py b/service/feature_store.py → huixiangdou/service/feature_store.py
diff --git a/service/helper.py → huixiangdou/service/helper.py b/service/helper.py → huixiangdou/service/helper.py
diff --git a/service/llm_client.py → huixiangdou/service/llm_client.py b/service/llm_client.py → huixiangdou/service/llm_client.py
@@ -113,7 +113,10 @@ def generate_response(self, prompt, history=[], remote=False):
                 'history': data_history,
                 'remote': remote
             }
-            resp = requests.post(url, headers=header, data=json.dumps(data))
+            resp = requests.post(url,
+                                 headers=header,
+                                 data=json.dumps(data),
+                                 timeout=5)
             if resp.status_code != 200:
                 raise Exception(str((resp.status_code, resp.reason)))
             return resp.json()['text']

diff --git a/service/llm_server_hybrid.py → huixiangdou/service/llm_server_hybrid.py b/service/llm_server_hybrid.py → huixiangdou/service/llm_server_hybrid.py
@@ -13,7 +13,7 @@
 from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
-class HybridLLMServer(object):
+class HybridLLMServer:
     """A class to handle server-side interactions with a hybrid language
     learning model (LLM) service.
 

diff --git a/service/sg_search.py → huixiangdou/service/sg_search.py b/service/sg_search.py → huixiangdou/service/sg_search.py
@@ -105,7 +105,7 @@ def choose_repo(self, llm_client, question, groupname):
 
         keys = self.sg_config.keys()
         skip = ['binary_src_path', 'src_access_token']
-        repos = dict()
+        repos = {}
         for key in keys:
             if key in skip:
                 continue