Merge pull request #188 from zifeng-radxa/main

docs: add Llama3 docs with zh/en
radxa-docs · May 22, 2024 · 82b0c31 · 82b0c31
2 parents c1a0285 + a4dd21c
commit 82b0c31
Show file tree

Hide file tree

Showing 6 changed files with 346 additions and 0 deletions.
diff --git a/docs/common/ai/_chatbot_llama3.mdx b/docs/common/ai/_chatbot_llama3.mdx
@@ -0,0 +1,164 @@
+Llama3 ChatBot-TPU 是使用 Sophon SDK 将 Meta 开源 [Llama3](https://ai.meta.com/blog/meta-llama-3/) 模型移植到 SG2300X 芯片系列产品上， 使其能利用本地 TPU 进行硬件加速推理，并使用 Gradio 设计成聊天机器人, 用户可以向其询问一些实际问题
+
+## Llama3 部署
+
+- 克隆仓库
+
+  ```bash
+  git clone https://github.com/zifeng-radxa/LLM-TPU.git
+  ```
+
+- 打开 Llama3 项目路径
+  ```bash
+  cd LLM-TPU/models/Llama3/python_demo
+  ```
+- 本案例提供 Llama-3-8B-Instruct 4bit 量化模型 llama3-8b_int4_1dev_512.bmodel 与 C++ 预编译文件下载
+
+  用户可以参考 [Llama3 模型转换](#llama3-模型转换)自行转换不同量化方式的 Llama3 模型
+
+  用户可以参考 [Llama3 cpython 文件编译](#llama3-cpython-文件编译) 自行编译 cpython 接口绑定文件
+
+  ```bash
+  # llama3-8b_int4_1dev_512.bmodel
+  wget https://github.com/radxa-edge/TPU-Edge-AI/releases/download/llama3/tar_downloader.sh
+  bash tar_downloader.sh
+  tar -xvf llama3-8b_int4_1dev_512.tar.gz
+  ```
+
+- 配置环境
+
+  **必须创建虚拟环境，否则可能会影响其他应用的正常运行**， 虚拟环境使用请参考[这里](../ai-tools/virtualenv_usage)
+
+  ```bash
+  python3 -m virtualenv .venv
+  source .venv/bin/activate
+  ```
+
+- 安装依赖包
+
+  ```bash
+  pip3 install --upgrade pip
+  pip3 install -r requirements.txt
+  ```
+
+- 导入环境变量
+
+  请使用 ldd 命令检查 chat.cpython-38-aarch64-linux-gnu.so 链接的 libbmlib.so 的路径是否为 `LLM-TPU/support/lib_soc/libbmlib.so`
+
+  如 `libbmlib.so` 链接路径有误可运行下面的命令
+
+  ```bash
+  export LD_LIBRARY_PATH=LLM-TPU/support/lib_soc:$LD_LIBRARY_PATH
+  ```
+
+- 启动 Llama3
+
+  **终端模式**
+
+  ```bash
+  python3 pipeline.py -m ./llama3-8b_int4_1dev_512.bmodel -t ../tokem_config
+  ```
+
+  `-m`： 指定模型路径
+
+  `-t`： 指定 token_config 文件夹路径
+
+  <img src="/img/general-tutorial/tpu_ai/llama3_pipeline.webp" />
+
+  **Gradio 模式**
+
+  ```bash
+  python3 web_demo.py -m ./llama3-8b_int4_1dev_512.bmodel -t ../tokem_config
+  ```
+
+  `-m`： 指定模型路径
+
+  `-t`： 指定 token_config 文件夹路径
+  浏览器访问 Airbox ip 地址的 8003 端口
+
+  {" "}
+
+  <img src="/img/general-tutorial/tpu_ai/llama3_web_demo.webp" />
+
+## Llama3 模型转换
+
+用户可以参考本文自行转换不同量化类型的 Llama3 模型到 bmodel
+
+- X86 工作站中准备环境
+
+  请参考 [TPU-MLIR 安装](../../model-compile/tpu_mlir_env) 配置 TPU-MLIR 环境
+  克隆仓库
+
+  ```bash
+  git clone https://github.com/zifeng-radxa/LLM-TPU.git
+  ```
+
+- 通过 [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/tree/main) 填写申请表下载 Llama3 开源模型
+
+- 在工作目录 `LLM-TPU/models/Llama3` 中创建虚拟环境
+
+  虚拟环境使用请参考[这里](../ai-tools/virtualenv_usage)
+
+  ```bash
+  python3 -m virtualenv .venv
+  source .venv/bin/activate
+  pip3 install --upgrade pip
+  pip3 install -r requirements.txt
+  ```
+
+- 对齐模型环境
+
+  将 `LLM-TPU/models/Llama3/compile/files/Meta-Llama-3-8B-Instruct/modeling_llama.py` 复制到 transformers 库中,
+  注意此时 transformers 库应在 .venv 里
+
+  ```bash
+  cp ./compile/files/Meta-Llama-3-8B-Instruct/modeling_llama.py .venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py
+  ```
+
+  同时将 `./compile/files/Meta-Llama-3-8B-Instruct/config.json` 替换下载好的 Llama-3-8B-Instruct 路径下的同名文件。
+
+- 生成 onnx 文件
+
+  ```bash
+  cd compile
+  python export_onnx.py --model_path your_model_path --seq_length 512
+  ```
+
+  `--model_path`： 是下载的 meta llama3 文件夹路径
+
+  `--seq_length`： 是固定导出的 sequence length, 根据需要可选 512, 1024，2048 等长度
+
+- 生成 bmodel 文件
+
+  生成 bmodel 之前需要退出虚拟环境
+
+  ```bash
+  deactivate
+  ```
+
+  编译模型
+
+  ```bash
+  ./compile.sh --mode int4 --name llama3-8b --seq_length 512 # same as int8
+  ```
+
+  `--mode`：量化模式，可选 int4, int8
+
+  `--seq_length`： 序列长度，需和生成 onnx 文件时指定相同 seq_length
+
+  `--name`： 模型名称，此处必须为 llama3-8b
+
+  生成 bmodel 耗时大概 2 小时以上，建议 64G 内存以及 200G 以上硬盘空间，不然很可能 OOM 或者 no space left
+
+## Llama3 cpython 文件编译
+
+在 Airbox 中编译可执行文件, 预编译文件已经包含在 llama3-8b_int4_1dev_512.tar.gz 下载包中，如已下载无需编译
+
+```bash
+cd python_demo
+mkdir build
+cd build
+cmake ..
+make
+cp *chat* ..
+```
diff --git a/docs/sophon/airbox/local-ai-deploy/large-model/chatbot_llama3.md b/docs/sophon/airbox/local-ai-deploy/large-model/chatbot_llama3.md
@@ -0,0 +1,9 @@
+---
+sidebar_position: 11
+---
+
+# Llama3 Chatbot-TPU
+
+import Chatbotllama3 from '../../../../common/ai/\_chatbot_llama3.mdx';
+
+<Chatbotllama3 />
diff --git a/i18n/en/docusaurus-plugin-content-docs/current/common/ai/_chatbot_llama3.mdx b/i18n/en/docusaurus-plugin-content-docs/current/common/ai/_chatbot_llama3.mdx
@@ -0,0 +1,164 @@
+Llama3 ChatBot-TPU uses the Sophon SDK to port Meta's open-source [Llama3](https://ai.meta.com/blog/meta-llama-3/) model to the SG2300X series chips. This enables hardware-accelerated inference using local TPU and designs it into a chatbot with Gradio, allowing users to ask practical questions.
+
+## Llama3 Deployment
+
+- Clone the repository
+
+  ```bash
+  git clone https://github.com/zifeng-radxa/LLM-TPU.git
+  ```
+
+- Open the Llama3 project directory
+
+  ```bash
+  cd LLM-TPU/models/Llama3/python_demo
+  ```
+
+- This example provides the Llama-3-8B-Instruct 4-bit quantized model `llama3-8b_int4_1dev_512.bmodel` and C++ precompiled files for download.
+
+  Users can refer to [Llama3 Model Conversion](#llama3-model-conversion) to convert Llama3 models to different quantization methods.
+
+  Users can refer to [Llama3 Cpython File Compilation](#llama3-cpython-file-compilation) to compile the cpython interface binding files themselves.
+
+  ```bash
+  # llama3-8b_int4_1dev_512.bmodel
+  wget https://github.com/radxa-edge/TPU-Edge-AI/releases/download/llama3/tar_downloader.sh
+  bash tar_downloader.sh
+  tar -xvf llama3-8b_int4_1dev_512.tar.gz
+  ```
+
+- Configure the environment
+
+  **A virtual environment must be created to avoid affecting the normal operation of other applications.** For virtual environment usage, please refer [here](../ai-tools/virtualenv_usage).
+
+  ```bash
+  python3 -m virtualenv .venv
+  source .venv/bin/activate
+  ```
+
+- Install dependencies
+
+  ```bash
+  pip3 install --upgrade pip
+  pip3 install -r requirements.txt
+  ```
+
+- Import environment variables
+
+  Use the `ldd` command to check if the `chat.cpython-38-aarch64-linux-gnu.so` is linked to the `libbmlib.so` at `LLM-TPU/support/lib_soc/libbmlib.so`.
+
+  If the `libbmlib.so` link path is incorrect, run the following command:
+
+  ```bash
+  export LD_LIBRARY_PATH=LLM-TPU/support/lib_soc:$LD_LIBRARY_PATH
+  ```
+
+- Start Llama3
+
+  **Terminal Mode**
+
+  ```bash
+  python3 pipeline.py -m ./llama3-8b_int4_1dev_512.bmodel -t ../tokem_config
+  ```
+
+  `-m`: Specify the model path
+
+  `-t`: Specify the token_config folder path
+
+  <img src="../../../../img/general-tutorial/tpu_ai/llama3_pipeline.webp" />
+
+  **Gradio Mode**
+
+  ```bash
+  python3 web_demo.py -m ./llama3-8b_int4_1dev_512.bmodel -t ../tokem_config
+  ```
+
+  `-m`: Specify the model path
+
+  `-t`: Specify the token_config folder path
+
+  Access the Airbox IP address at port 8003 in your browser.
+
+  <img src="../../../../img/general-tutorial/tpu_ai/llama3_web_demo.webp" />
+
+## Llama3 Model Conversion
+
+Users can refer to this document to convert Llama3 models of different quantization types to bmodel.
+
+- Prepare the environment on an X86 workstation
+
+  Please refer to [TPU-MLIR Installation](../../model-compile/tpu_mlir_env) to configure the TPU-MLIR environment.
+  Clone the repository
+
+  ```bash
+  git clone https://github.com/zifeng-radxa/LLM-TPU.git
+  ```
+
+- Download the open-source Llama3 model by filling out the application form on [Huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/tree/main).
+
+- Create a virtual environment in the `LLM-TPU/models/Llama3` directory.
+
+  For virtual environment usage, please refer [here](../ai-tools/virtualenv_usage).
+
+  ```bash
+  python3 -m virtualenv .venv
+  source .venv/bin/activate
+  pip3 install --upgrade pip
+  pip3 install -r requirements.txt
+  ```
+
+- Align the model environment
+
+  Copy `LLM-TPU/models/Llama3/compile/files/Meta-Llama-3-8B-Instruct/modeling_llama.py` to the transformers library, noting that the transformers library should be in the .venv.
+
+  ```bash
+  cp ./compile/files/Meta-Llama-3-8B-Instruct/modeling_llama.py .venv/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py
+  ```
+
+  Replace the `./compile/files/Meta-Llama-3-8B-Instruct/config.json` with the same file in the downloaded Llama-3-8B-Instruct path.
+
+- Generate the onnx file
+
+  ```bash
+  cd compile
+  python export_onnx.py --model_path your_model_path --seq_length 512
+  ```
+
+  `--model_path`: Path to the downloaded meta llama3 folder
+
+  `--seq_length`: Fixed sequence length to export, selectable as 512, 1024, 2048, etc., as needed
+
+- Generate the bmodel file
+
+  Exit the virtual environment before generating the bmodel
+
+  ```bash
+  deactivate
+  ```
+
+  Compile the model
+
+  ```bash
+  ./compile.sh --mode int4 --name llama3-8b --seq_length 512 # same as int8
+  ```
+
+  `--mode`: Quantization mode, options are int4, int8
+
+  `--seq_length`: Sequence length, should match the seq_length specified when generating the onnx file
+
+  `--name`: Model name, must be llama3-8b
+
+  Generating the bmodel takes about 2 hours or more. It is recommended to have 64G memory and over 200G of disk space, otherwise OOM or no space left errors are likely.
+
+## Llama3 Cpython File Compilation
+
+Compile executable files in the Airbox. Precompiled files are included in the `llama3-8b_int4_1dev_512.tar.gz` download package. If already downloaded, no need to compile.
+
+```bash
+cd python_demo
+mkdir build
+cd build
+cmake ..
+make
+cp *chat* ..
+```
diff --git a/...ontent-docs/current/sophon/airbox/local-ai-deploy/large-model/chatbot_llama3.md b/...ontent-docs/current/sophon/airbox/local-ai-deploy/large-model/chatbot_llama3.md
@@ -0,0 +1,9 @@
+---
+sidebar_position: 11
+---
+
+# Llama3 Chatbot-TPU
+
+import Chatbotllama3 from '../../../../common/ai/\_chatbot_llama3.mdx';
+
+<Chatbotllama3 />
diff --git a/static/img/general-tutorial/tpu_ai/llama3_pipeline.webp b/static/img/general-tutorial/tpu_ai/llama3_pipeline.webp
diff --git a/static/img/general-tutorial/tpu_ai/llama3_web_demo.webp b/static/img/general-tutorial/tpu_ai/llama3_web_demo.webp