diff --git a/docs/sphinx_doc/en/source/tutorial/210-rag.md b/docs/sphinx_doc/en/source/tutorial/210-rag.md index 867fdb2ec..39c3ecce0 100644 --- a/docs/sphinx_doc/en/source/tutorial/210-rag.md +++ b/docs/sphinx_doc/en/source/tutorial/210-rag.md @@ -190,6 +190,111 @@ RAG agent is an agent that can generate answers based on the retrieved knowledge Your agent will be equipped with a list of knowledge according to the `knowledge_id_list`. You can decide how to use the retrieved content and even update and refresh the index in your agent's `reply` function. +## (Optional) Setting up a local embedding model service + +For those who are interested in setting up a local embedding service, we provide the following example based on the +`sentence_transformers` package, which is a popular specialized package for embedding models (based on the `transformer` package and compatible with both HuggingFace and ModelScope models). +In this example, we will use one of the SOTA embedding models, `gte-Qwen2-7B-instruct`. + +* Step 1: Follow the instruction on [HuggingFace](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct) or [ModelScope](https://www.modelscope.cn/models/iic/gte_Qwen2-7B-instruct ) to download the embedding model. + (For those who cannot access HuggingFace directly, you may want to use a HuggingFace mirror by running a bash command + `export HF_ENDPOINT=https://hf-mirror.com` or add a line of code `os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"` in your Python code.) +* Step 2: Set up the server. The following code is for reference. + +```python +import datetime +import argparse + +from flask import Flask +from flask import request +from sentence_transformers import SentenceTransformer + +def create_timestamp(format_: str = "%Y-%m-%d %H:%M:%S") -> str: + """Get current timestamp.""" + return datetime.datetime.now().strftime(format_) + +app = Flask(__name__) + +@app.route("/embedding/", methods=["POST"]) +def get_embedding() -> dict: + """Receive post request and return response""" + json = request.get_json() + + inputs = json.pop("inputs") + + global model + + if isinstance(inputs, str): + inputs = [inputs] + + embeddings = model.encode(inputs) + + return { + "data": { + "completion_tokens": 0, + "messages": {}, + "prompt_tokens": 0, + "response": { + "data": [ + { + "embedding": emb.astype(float).tolist(), + } + for emb in embeddings + ], + "created": "", + "id": create_timestamp(), + "model": "flask_model", + "object": "text_completion", + "usage": { + "completion_tokens": 0, + "prompt_tokens": 0, + "total_tokens": 0, + }, + }, + "total_tokens": 0, + "username": "", + }, + } + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument("--model_name_or_path", type=str, required=True) + parser.add_argument("--device", type=str, default="auto") + parser.add_argument("--port", type=int, default=8000) + args = parser.parse_args() + + global model + + print("setting up for embedding model....") + model = SentenceTransformer( + args.model_name_or_path + ) + + app.run(port=args.port) +``` + +* Step 3: start server. +```bash +python setup_ms_service.py --model_name_or_path {$PATH_TO_gte_Qwen2_7B_instruct} +``` + + +Testing whether the model is running successfully. +```python +from agentscope.models.post_model import PostAPIEmbeddingWrapper + + +model = PostAPIEmbeddingWrapper( + config_name="test_config", + api_url="http://127.0.0.1:8000/embedding/", + json_args={ + "max_length": 4096, + "temperature": 0.5 + } +) + +print(model("testing")) +``` [[Back to the top]](#210-rag-en) diff --git a/docs/sphinx_doc/zh_CN/source/tutorial/210-rag.md b/docs/sphinx_doc/zh_CN/source/tutorial/210-rag.md index 7a0efd7d0..7921dd31d 100644 --- a/docs/sphinx_doc/zh_CN/source/tutorial/210-rag.md +++ b/docs/sphinx_doc/zh_CN/source/tutorial/210-rag.md @@ -174,6 +174,113 @@ RAG 智能体是可以基于检索到的知识生成答案的智能体。 **自己搭建 RAG 智能体.** 只要您的智能体配置具有`knowledge_id_list`,您就可以将一个agent和这个列表传递给`KnowledgeBank.equip`;这样该agent就是被装配`knowledge_id`。 您可以在`reply`函数中自己决定如何从`Knowledge`对象中提取和使用信息,甚至通过`Knowledge`修改知识库。 + +## (拓展) 架设自己的embedding model服务 + +我们在此也对架设本地embedding model感兴趣的用户提供以下的样例。 +以下样例基于在embedding model范围中很受欢迎的`sentence_transformers` 包(基于`transformer` 而且兼容HuggingFace和ModelScope的模型)。 +这个样例中,我们会使用当下最好的文本向量模型之一`gte-Qwen2-7B-instruct`。 + + +* 第一步: 遵循在 [HuggingFace](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct) 或者 [ModelScope](https://www.modelscope.cn/models/iic/gte_Qwen2-7B-instruct )的指示下载模型。 + (如果无法直接从HuggingFace下载模型,也可以考虑使用HuggingFace镜像:bash命令行`export HF_ENDPOINT=https://hf-mirror.com`,或者在Python代码中加入`os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"`) +* 第二步: 设置服务器。以下是一段参考代码。 + +```python +import datetime +import argparse + +from flask import Flask +from flask import request +from sentence_transformers import SentenceTransformer + +def create_timestamp(format_: str = "%Y-%m-%d %H:%M:%S") -> str: + """Get current timestamp.""" + return datetime.datetime.now().strftime(format_) + +app = Flask(__name__) + +@app.route("/embedding/", methods=["POST"]) +def get_embedding() -> dict: + """Receive post request and return response""" + json = request.get_json() + + inputs = json.pop("inputs") + + global model + + if isinstance(inputs, str): + inputs = [inputs] + + embeddings = model.encode(inputs) + + return { + "data": { + "completion_tokens": 0, + "messages": {}, + "prompt_tokens": 0, + "response": { + "data": [ + { + "embedding": emb.astype(float).tolist(), + } + for emb in embeddings + ], + "created": "", + "id": create_timestamp(), + "model": "flask_model", + "object": "text_completion", + "usage": { + "completion_tokens": 0, + "prompt_tokens": 0, + "total_tokens": 0, + }, + }, + "total_tokens": 0, + "username": "", + }, + } + +if __name__ == "__main__": + parser = argparse.ArgumentParser() + parser.add_argument("--model_name_or_path", type=str, required=True) + parser.add_argument("--device", type=str, default="auto") + parser.add_argument("--port", type=int, default=8000) + args = parser.parse_args() + + global model + + print("setting up for embedding model....") + model = SentenceTransformer( + args.model_name_or_path + ) + + app.run(port=args.port) +``` + +* 第三部:启动服务器。 +```bash +python setup_ms_service.py --model_name_or_path {$PATH_TO_gte_Qwen2_7B_instruct} +``` + + +测试服务是否成功启动。 +```python +from agentscope.models.post_model import PostAPIEmbeddingWrapper + + +model = PostAPIEmbeddingWrapper( + config_name="test_config", + api_url="http://127.0.0.1:8000/embedding/", + json_args={ + "max_length": 4096, + "temperature": 0.5 + } +) + +print(model("testing")) +``` + [[回到顶部]](#210-rag-zh)