diff --git a/docs/sphinx_doc/en/source/tutorial/210-rag.md b/docs/sphinx_doc/en/source/tutorial/210-rag.md
index 867fdb2ec..39c3ecce0 100644
--- a/docs/sphinx_doc/en/source/tutorial/210-rag.md
+++ b/docs/sphinx_doc/en/source/tutorial/210-rag.md
@@ -190,6 +190,111 @@ RAG agent is an agent that can generate answers based on the retrieved knowledge
 Your agent will be equipped with a list of knowledge according to the `knowledge_id_list`.
 You can decide how to use the retrieved content and even update and refresh the index in your agent's `reply` function.
 
+## (Optional) Setting up a local embedding model service
+
+For those who are interested in setting up a local embedding service, we provide the following example based on the
+`sentence_transformers` package, which is a popular specialized package for embedding models (based on the `transformer` package and compatible with both HuggingFace and ModelScope models).
+In this example, we will use one of the SOTA embedding models, `gte-Qwen2-7B-instruct`.
+
+* Step 1: Follow the instruction on [HuggingFace](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct) or [ModelScope](https://www.modelscope.cn/models/iic/gte_Qwen2-7B-instruct ) to download the embedding model.
+  (For those who cannot access HuggingFace directly, you may want to use a HuggingFace mirror by running a bash command
+    `export HF_ENDPOINT=https://hf-mirror.com` or add a line of code `os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"` in your Python code.)
+* Step 2: Set up the server. The following code is for reference.
+
+```python
+import datetime
+import argparse
+
+from flask import Flask
+from flask import request
+from sentence_transformers import SentenceTransformer
+
+def create_timestamp(format_: str = "%Y-%m-%d %H:%M:%S") -> str:
+    """Get current timestamp."""
+    return datetime.datetime.now().strftime(format_)
+
+app = Flask(__name__)
+
+@app.route("/embedding/", methods=["POST"])
+def get_embedding() -> dict:
+    """Receive post request and return response"""
+    json = request.get_json()
+
+    inputs = json.pop("inputs")
+
+    global model
+
+    if isinstance(inputs, str):
+        inputs = [inputs]
+
+    embeddings = model.encode(inputs)
+
+    return {
+        "data": {
+            "completion_tokens": 0,
+            "messages": {},
+            "prompt_tokens": 0,
+            "response": {
+                "data": [
+                    {
+                        "embedding": emb.astype(float).tolist(),
+                    }
+                    for emb in embeddings
+                ],
+                "created": "",
+                "id": create_timestamp(),
+                "model": "flask_model",
+                "object": "text_completion",
+                "usage": {
+                    "completion_tokens": 0,
+                    "prompt_tokens": 0,
+                    "total_tokens": 0,
+                },
+            },
+            "total_tokens": 0,
+            "username": "",
+        },
+    }
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model_name_or_path", type=str, required=True)
+    parser.add_argument("--device", type=str, default="auto")
+    parser.add_argument("--port", type=int, default=8000)
+    args = parser.parse_args()
+
+    global model
+
+    print("setting up for embedding model....")
+    model = SentenceTransformer(
+        args.model_name_or_path
+    )
+
+    app.run(port=args.port)
+```
+
+* Step 3: start server.
+```bash
+python setup_ms_service.py --model_name_or_path {$PATH_TO_gte_Qwen2_7B_instruct}
+```
+
+
+Testing whether the model is running successfully.
+```python
+from agentscope.models.post_model import PostAPIEmbeddingWrapper
+
+
+model = PostAPIEmbeddingWrapper(
+    config_name="test_config",
+    api_url="http://127.0.0.1:8000/embedding/",
+    json_args={
+        "max_length": 4096,
+        "temperature": 0.5
+    }
+)
+
+print(model("testing"))
+```
 
 [[Back to the top]](#210-rag-en)
 
diff --git a/docs/sphinx_doc/zh_CN/source/tutorial/210-rag.md b/docs/sphinx_doc/zh_CN/source/tutorial/210-rag.md
index 7a0efd7d0..7921dd31d 100644
--- a/docs/sphinx_doc/zh_CN/source/tutorial/210-rag.md
+++ b/docs/sphinx_doc/zh_CN/source/tutorial/210-rag.md
@@ -174,6 +174,113 @@ RAG 智能体是可以基于检索到的知识生成答案的智能体。
 **自己搭建 RAG 智能体.** 只要您的智能体配置具有`knowledge_id_list`，您就可以将一个agent和这个列表传递给`KnowledgeBank.equip`；这样该agent就是被装配`knowledge_id`。
 您可以在`reply`函数中自己决定如何从`Knowledge`对象中提取和使用信息，甚至通过`Knowledge`修改知识库。
 
+
+## (拓展) 架设自己的embedding model服务
+
+我们在此也对架设本地embedding model感兴趣的用户提供以下的样例。
+以下样例基于在embedding model范围中很受欢迎的`sentence_transformers` 包（基于`transformer` 而且兼容HuggingFace和ModelScope的模型）。
+这个样例中，我们会使用当下最好的文本向量模型之一`gte-Qwen2-7B-instruct`。
+
+
+* 第一步: 遵循在 [HuggingFace](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct) 或者 [ModelScope](https://www.modelscope.cn/models/iic/gte_Qwen2-7B-instruct )的指示下载模型。
+  (如果无法直接从HuggingFace下载模型，也可以考虑使用HuggingFace镜像：bash命令行`export HF_ENDPOINT=https://hf-mirror.com`，或者在Python代码中加入`os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"`)
+* 第二步: 设置服务器。以下是一段参考代码。
+
+```python
+import datetime
+import argparse
+
+from flask import Flask
+from flask import request
+from sentence_transformers import SentenceTransformer
+
+def create_timestamp(format_: str = "%Y-%m-%d %H:%M:%S") -> str:
+    """Get current timestamp."""
+    return datetime.datetime.now().strftime(format_)
+
+app = Flask(__name__)
+
+@app.route("/embedding/", methods=["POST"])
+def get_embedding() -> dict:
+    """Receive post request and return response"""
+    json = request.get_json()
+
+    inputs = json.pop("inputs")
+
+    global model
+
+    if isinstance(inputs, str):
+        inputs = [inputs]
+
+    embeddings = model.encode(inputs)
+
+    return {
+        "data": {
+            "completion_tokens": 0,
+            "messages": {},
+            "prompt_tokens": 0,
+            "response": {
+                "data": [
+                    {
+                        "embedding": emb.astype(float).tolist(),
+                    }
+                    for emb in embeddings
+                ],
+                "created": "",
+                "id": create_timestamp(),
+                "model": "flask_model",
+                "object": "text_completion",
+                "usage": {
+                    "completion_tokens": 0,
+                    "prompt_tokens": 0,
+                    "total_tokens": 0,
+                },
+            },
+            "total_tokens": 0,
+            "username": "",
+        },
+    }
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--model_name_or_path", type=str, required=True)
+    parser.add_argument("--device", type=str, default="auto")
+    parser.add_argument("--port", type=int, default=8000)
+    args = parser.parse_args()
+
+    global model
+
+    print("setting up for embedding model....")
+    model = SentenceTransformer(
+        args.model_name_or_path
+    )
+
+    app.run(port=args.port)
+```
+
+* 第三部：启动服务器。
+```bash
+python setup_ms_service.py --model_name_or_path {$PATH_TO_gte_Qwen2_7B_instruct}
+```
+
+
+测试服务是否成功启动。
+```python
+from agentscope.models.post_model import PostAPIEmbeddingWrapper
+
+
+model = PostAPIEmbeddingWrapper(
+    config_name="test_config",
+    api_url="http://127.0.0.1:8000/embedding/",
+    json_args={
+        "max_length": 4096,
+        "temperature": 0.5
+    }
+)
+
+print(model("testing"))
+```
+
 [[回到顶部]](#210-rag-zh)